π€ AI Summary
Text-to-image diffusion models suffer from concept misuse risks, yet existing concept unlearning methods are evaluated incomprehensively, overlooking critical side effects. To address this, we propose the Holistic Unlearning Benchmark (HUB)βthe first multidimensional evaluation framework tailored for concept unlearning in text-to-image diffusion models. HUB systematically assesses unlearning performance across six dimensions: faithfulness, consistency, precision, multilingual robustness, adversarial robustness, and efficiency. It encompasses 33 target concepts and over 520,000 multilingual prompts, revealing for the first time substantial trade-offs among dimensions and challenging the prevailing single-metric dominance paradigm. Extensive experiments show that no existing method dominates across all six dimensions. We publicly release HUBβs codebase and dataset to foster standardized, reproducible evaluation of trustworthy AI unlearning techniques.
π Abstract
As text-to-image diffusion models gain widespread commercial applications, there are increasing concerns about unethical or harmful use, including the unauthorized generation of copyrighted or sensitive content. Concept unlearning has emerged as a promising solution to these challenges by removing undesired and harmful information from the pre-trained model. However, the previous evaluations primarily focus on whether target concepts are removed while preserving image quality, neglecting the broader impacts such as unintended side effects. In this work, we propose Holistic Unlearning Benchmark (HUB), a comprehensive framework for evaluating unlearning methods across six key dimensions: faithfulness, alignment, pinpoint-ness, multilingual robustness, attack robustness, and efficiency. Our benchmark covers 33 target concepts, including 16,000 prompts per concept, spanning four categories: Celebrity, Style, Intellectual Property, and NSFW. Our investigation reveals that no single method excels across all evaluation criteria. By releasing our evaluation code and dataset, we hope to inspire further research in this area, leading to more reliable and effective unlearning methods.