Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning

📅 2024-10-08

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Text-to-image diffusion models suffer from concept misuse risks, yet existing concept unlearning methods are evaluated incomprehensively, overlooking critical side effects. To address this, we propose the Holistic Unlearning Benchmark (HUB)—the first multidimensional evaluation framework tailored for concept unlearning in text-to-image diffusion models. HUB systematically assesses unlearning performance across six dimensions: faithfulness, consistency, precision, multilingual robustness, adversarial robustness, and efficiency. It encompasses 33 target concepts and over 520,000 multilingual prompts, revealing for the first time substantial trade-offs among dimensions and challenging the prevailing single-metric dominance paradigm. Extensive experiments show that no existing method dominates across all six dimensions. We publicly release HUB’s codebase and dataset to foster standardized, reproducible evaluation of trustworthy AI unlearning techniques.

Technology Category

Application Category

📝 Abstract

As text-to-image diffusion models gain widespread commercial applications, there are increasing concerns about unethical or harmful use, including the unauthorized generation of copyrighted or sensitive content. Concept unlearning has emerged as a promising solution to these challenges by removing undesired and harmful information from the pre-trained model. However, the previous evaluations primarily focus on whether target concepts are removed while preserving image quality, neglecting the broader impacts such as unintended side effects. In this work, we propose Holistic Unlearning Benchmark (HUB), a comprehensive framework for evaluating unlearning methods across six key dimensions: faithfulness, alignment, pinpoint-ness, multilingual robustness, attack robustness, and efficiency. Our benchmark covers 33 target concepts, including 16,000 prompts per concept, spanning four categories: Celebrity, Style, Intellectual Property, and NSFW. Our investigation reveals that no single method excels across all evaluation criteria. By releasing our evaluation code and dataset, we hope to inspire further research in this area, leading to more reliable and effective unlearning methods.

Problem

Research questions and friction points this paper is trying to address.

Evaluates unlearning methods for text-to-image diffusion models

Addresses ethical concerns in unauthorized content generation

Assesses broader impacts beyond concept removal and image quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Holistic Unlearning Benchmark for multi-faceted evaluation

Evaluates unlearning across six key dimensions

Covers 33 target concepts with 16,000 prompts

🔎 Similar Papers

No similar papers found.