On the Importance of Conditioning for Privacy-Preserving Data Augmentation

📅 2025-04-08

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This paper identifies a fundamental privacy vulnerability in Conditional Latent Diffusion Models (CLDMs) used for privacy-preserving data augmentation: their reliance on structured conditional signals—such as edges or depth maps—for image synthesis inadvertently leaks identity information. To expose this flaw, we propose a contrastive learning–based identification framework and a black-box inversion attack, enabling the first systematic demonstration that CLDM-augmented images violate basic privacy guarantees, including *k*-anonymity. Experiments show that adversaries can achieve high-accuracy cross-image identity re-identification on standard face recognition benchmarks using *only* the augmented images—without access to originals or model internals. Our core contributions are threefold: (1) establishing conditional signals as the primary source of identity leakage; (2) revealing CLDMs’ inherent susceptibility to black-box inversion attacks; and (3) providing both theoretical caution and practical design boundaries for developing privacy-enhancing generative models.

Technology Category

Application Category

📝 Abstract

Latent diffusion models can be used as a powerful augmentation method to artificially extend datasets for enhanced training. To the human eye, these augmented images look very different to the originals. Previous work has suggested to use this data augmentation technique for data anonymization. However, we show that latent diffusion models that are conditioned on features like depth maps or edges to guide the diffusion process are not suitable as a privacy preserving method. We use a contrastive learning approach to train a model that can correctly identify people out of a pool of candidates. Moreover, we demonstrate that anonymization using conditioned diffusion models is susceptible to black box attacks. We attribute the success of the described methods to the conditioning of the latent diffusion model in the anonymization process. The diffusion model is instructed to produce similar edges for the anonymized images. Hence, a model can learn to recognize these patterns for identification.

Problem

Research questions and friction points this paper is trying to address.

Conditioned latent diffusion models fail to preserve privacy

Contrastive learning identifies individuals despite anonymization

Conditioned diffusion models vulnerable to black box attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Conditioned latent diffusion models for data augmentation

Contrastive learning for person identification

Black box attacks on conditioned anonymization

🔎 Similar Papers

PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding