🤖 AI Summary
To address concept contamination in diffusion-based text-to-image generation—arising from copyright-infringing or unsafe content in training data—this paper proposes the first concept removal method that simultaneously achieves effectiveness, utility preservation, and adversarial robustness. Our approach introduces a fine-grained concept filtering framework grounded in CLIP embedding space distance regulation: it employs contrastive fine-tuning to optimize text embeddings, enabling precise, concept-level suppression of unacceptable content; and incorporates an adversarial prompt evaluation mechanism to substantially enhance robustness against perturbed or obfuscated prompts. Extensive evaluations across multiple benchmarks demonstrate a >35% improvement in interception accuracy, minimal utility degradation (only 1.2% drop in generation quality), and sustained robustness (>92%) against diverse adversarial prompts. To our knowledge, this is the first work to jointly satisfy all three core objectives of concept removal—effectiveness, utility retention, and adversarial robustness—in a unified framework.
📝 Abstract
Diffusion based text-to-image models are trained on large datasets scraped from the Internet, potentially containing unacceptable concepts (e.g., copyright-infringing or unsafe). We need concept removal techniques (CRTs) which are i) effective in preventing the generation of images with unacceptable concepts, ii) utility-preserving on acceptable concepts, and, iii) robust against evasion with adversarial prompts. No prior CRT satisfies all these requirements simultaneously. We introduce Espresso, the first robust concept filter based on Contrastive Language-Image Pre-Training (CLIP). We identify unacceptable concepts by using the distance between the embedding of a generated image to the text embeddings of both unacceptable and acceptable concepts. This lets us fine-tune for robustness by separating the text embeddings of unacceptable and acceptable concepts while preserving utility. We present a pipeline to evaluate various CRTs to show that Espresso is more effective and robust than prior CRTs, while retaining utility.