🤖 AI Summary
This work investigates training data inversion in latent diffusion models (LDMs), focusing on the non-uniform impact of autoencoder geometric structure on model memorization. We propose a latent dimension ranking method based on the decoder pullback metric, which— for the first time—reveals heterogeneous contributions of individual latent dimensions to membership inference attacks (MIAs). By integrating score-matching-based MIAs with dimension-sensitive analysis, we significantly improve privacy attack performance across multiple benchmark datasets: average AUROC increases by 2.7%, and TPR@1%FPR rises by 6.42%, notably enhancing confidence in membership identification under low false-positive regimes. Our findings establish a fundamental link between latent space geometry in LDMs and privacy leakage, offering new insights into the memorization mechanisms of generative models and informing the design of principled privacy-preserving strategies.
📝 Abstract
The recovery of training data from generative models (``model inversion'') has been extensively studied for diffusion models in the data domain. The encoder/decoder pair and corresponding latent codes have largely been ignored by inversion techniques applied to latent space generative models, e.g., Latent Diffusion models (LDMs). In this work we describe two key findings: (1) The diffusion model exhibits non-uniform memorization across latent codes, tending to overfit samples located in high-distortion regions of the decoder pullback metric. (2) Even within a single latent code, different dimensions contribute unequally to memorization. We introduce a principled method to rank latent dimensions by their per-dimensional contribution to the decoder pullback metric, identifying those most responsible for memorization. Empirically, removing less-memorizing dimensions when computing attack statistics for score-based membership inference attacker significantly improves performance, with average AUROC gains of 2.7% and substantial increases in TPR@1%FPR (6.42%) across diverse datasets including CIFAR-10, CelebA, ImageNet-1K, Pok'emon, MS-COCO, and Flickr. This indicates stronger confidence in identifying members under extremely low false-positive tolerance. Our results highlight the overlooked influence of the auto-encoder geometry on LDM memorization and provide a new perspective for analyzing privacy risks in diffusion-based generative models.