FADE: Adversarial Concept Erasure in Flow Models

📅 2025-07-16

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This paper addresses privacy leakage and algorithmic bias in text-to-image diffusion models arising from sensitive concepts—such as personal identities or harmful stereotypes. We propose FADE, a model-agnostic concept sanitization method that requires no retraining. FADE integrates trajectory-aware fine-tuning with adversarial objective optimization to explicitly minimize mutual information between sensitive concepts and generated images within the diffusion latent space, thereby theoretically guaranteeing complete concept forgetting. Implemented on Stable Diffusion and FLUX frameworks, FADE balances precise concept removal with high image fidelity. Extensive evaluation across multiple benchmarks shows FADE improves the harmonic mean metric by 5–10% over state-of-the-art methods. Notably, it is the first approach to achieve provably private and fair concept erasure in diffusion models—uniquely unifying formal privacy guarantees with fairness-aware forgetting.

Technology Category

Application Category

📝 Abstract

Diffusion models have demonstrated remarkable image generation capabilities, but also pose risks in privacy and fairness by memorizing sensitive concepts or perpetuating biases. We propose a novel extbf{concept erasure} method for text-to-image diffusion models, designed to remove specified concepts (e.g., a private individual or a harmful stereotype) from the model's generative repertoire. Our method, termed extbf{FADE} (Fair Adversarial Diffusion Erasure), combines a trajectory-aware fine-tuning strategy with an adversarial objective to ensure the concept is reliably removed while preserving overall model fidelity. Theoretically, we prove a formal guarantee that our approach minimizes the mutual information between the erased concept and the model's outputs, ensuring privacy and fairness. Empirically, we evaluate FADE on Stable Diffusion and FLUX, using benchmarks from prior work (e.g., object, celebrity, explicit content, and style erasure tasks from MACE). FADE achieves state-of-the-art concept removal performance, surpassing recent baselines like ESD, UCE, MACE, and ANT in terms of removal efficacy and image quality. Notably, FADE improves the harmonic mean of concept removal and fidelity by 5--10% over the best prior method. We also conduct an ablation study to validate each component of FADE, confirming that our adversarial and trajectory-preserving objectives each contribute to its superior performance. Our work sets a new standard for safe and fair generative modeling by unlearning specified concepts without retraining from scratch.

Problem

Research questions and friction points this paper is trying to address.

Remove sensitive concepts from diffusion models

Ensure privacy and fairness in image generation

Maintain model fidelity while erasing specified concepts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trajectory-aware fine-tuning for concept removal

Adversarial objective ensures minimal mutual information

State-of-the-art performance in erasure and fidelity

🔎 Similar Papers

Erasing Conceptual Knowledge from Language Models