EraseBench: Understanding The Ripple Effects of Concept Erasure Techniques

📅 2025-01-16

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work systematically exposes the cascading failure of “concept erasure” techniques in text-to-image models when handling visually similar, semantically associated, or binary-opposite concepts—termed “concept confusion” and the newly identified “concept ripple effect.” To enable rigorous quantification, we introduce EraseBENCH, the first benchmark tailored for multi-dimensional concept relationships, comprising 100+ concepts and 1,000+ curated prompts. We propose a unified evaluation framework integrating multi-faceted prompt engineering, concept relationship modeling, and adversarial interference testing, with dual metrics assessing both image quality and semantic fidelity. Extensive experiments reveal that state-of-the-art erasure methods suffer from substantial degradation in visual quality and pervasive semantic leakage under realistic conditions, undermining their reliability for industrial deployment.

Technology Category

Application Category

📝 Abstract

Concept erasure techniques have recently gained significant attention for their potential to remove unwanted concepts from text-to-image models. While these methods often demonstrate success in controlled scenarios, their robustness in real-world applications and readiness for deployment remain uncertain. In this work, we identify a critical gap in evaluating sanitized models, particularly in terms of their performance across various concept dimensions. We systematically investigate the failure modes of current concept erasure techniques, with a focus on visually similar, binomial, and semantically related concepts. We propose that these interconnected relationships give rise to a phenomenon of concept entanglement resulting in ripple effects and degradation in image quality. To facilitate more comprehensive evaluation, we introduce EraseBENCH, a multi-dimensional benchmark designed to assess concept erasure methods with greater depth. Our dataset includes over 100 diverse concepts and more than 1,000 tailored prompts, paired with a comprehensive suite of metrics that together offer a holistic view of erasure efficacy. Our findings reveal that even state-of-the-art techniques struggle with maintaining quality post-erasure, indicating that these approaches are not yet ready for real-world deployment. This highlights the gap in reliability of the concept erasure techniques.

Problem

Research questions and friction points this paper is trying to address.

Concept Erasure

Image Generation Models

Concept Confusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept Erasure

EraseBENCH

Image Quality Degradation

🔎 Similar Papers

Erasing Conceptual Knowledge from Language Models