🤖 AI Summary
This work systematically exposes the cascading failure of “concept erasure” techniques in text-to-image models when handling visually similar, semantically associated, or binary-opposite concepts—termed “concept confusion” and the newly identified “concept ripple effect.” To enable rigorous quantification, we introduce EraseBENCH, the first benchmark tailored for multi-dimensional concept relationships, comprising 100+ concepts and 1,000+ curated prompts. We propose a unified evaluation framework integrating multi-faceted prompt engineering, concept relationship modeling, and adversarial interference testing, with dual metrics assessing both image quality and semantic fidelity. Extensive experiments reveal that state-of-the-art erasure methods suffer from substantial degradation in visual quality and pervasive semantic leakage under realistic conditions, undermining their reliability for industrial deployment.
📝 Abstract
Concept erasure techniques have recently gained significant attention for their potential to remove unwanted concepts from text-to-image models. While these methods often demonstrate success in controlled scenarios, their robustness in real-world applications and readiness for deployment remain uncertain. In this work, we identify a critical gap in evaluating sanitized models, particularly in terms of their performance across various concept dimensions. We systematically investigate the failure modes of current concept erasure techniques, with a focus on visually similar, binomial, and semantically related concepts. We propose that these interconnected relationships give rise to a phenomenon of concept entanglement resulting in ripple effects and degradation in image quality. To facilitate more comprehensive evaluation, we introduce EraseBENCH, a multi-dimensional benchmark designed to assess concept erasure methods with greater depth. Our dataset includes over 100 diverse concepts and more than 1,000 tailored prompts, paired with a comprehensive suite of metrics that together offer a holistic view of erasure efficacy. Our findings reveal that even state-of-the-art techniques struggle with maintaining quality post-erasure, indicating that these approaches are not yet ready for real-world deployment. This highlights the gap in reliability of the concept erasure techniques.