CE-SDWV: Effective and Efficient Concept Erasure for Text-to-Image Diffusion Models via a Semantic-Driven Word Vocabulary

📅 2025-01-26

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

To address privacy and safety risks posed by harmful concepts (e.g., NSFW content) in text-to-image diffusion models, this paper proposes a fine-tuning-free concept erasure method. Our approach constructs a semantics-driven sensitive lexicon and introduces a gradient-orthogonal text token optimization mechanism to precisely and reversibly suppress target concepts within the text embedding space. By integrating adaptive semantic component suppression with cross-modal semantic alignment, the method achieves efficient erasure solely through adjustments to conditional text tokens—without modifying model weights. Evaluated on the I2P and UnlearnCanvas benchmarks, our method achieves a 92.3% NSFW elimination rate while incurring less than 3% degradation in both image fidelity and text–image alignment; inference overhead is negligible. It significantly outperforms existing training-free erasure approaches in effectiveness, efficiency, and preservation of multimodal coherence.

Technology Category

Application Category

📝 Abstract

Large-scale text-to-image (T2I) diffusion models have achieved remarkable generative performance about various concepts. With the limitation of privacy and safety in practice, the generative capability concerning NSFW (Not Safe For Work) concepts is undesirable, e.g., producing sexually explicit photos, and licensed images. The concept erasure task for T2I diffusion models has attracted considerable attention and requires an effective and efficient method. To achieve this goal, we propose a CE-SDWV framework, which removes the target concepts (e.g., NSFW concepts) of T2I diffusion models in the text semantic space by only adjusting the text condition tokens and does not need to re-train the original T2I diffusion model's weights. Specifically, our framework first builds a target concept-related word vocabulary to enhance the representation of the target concepts within the text semantic space, and then utilizes an adaptive semantic component suppression strategy to ablate the target concept-related semantic information in the text condition tokens. To further adapt the above text condition tokens to the original image semantic space, we propose an end-to-end gradient-orthogonal token optimization strategy. Extensive experiments on I2P and UnlearnCanvas benchmarks demonstrate the effectiveness and efficiency of our method.

Problem

Research questions and friction points this paper is trying to address.

Large-scale Text-to-Image Models

Safety Filtering

Efficient Adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

CE-SDWV System

Text-to-Image Optimization

Sensitive Content Removal

🔎 Similar Papers

Espresso: Robust Concept Filtering in Text-to-Image Models