Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning

πŸ“… 2024-10-08
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Text-to-image diffusion models suffer from concept misuse risks, yet existing concept unlearning methods are evaluated incomprehensively, overlooking critical side effects. To address this, we propose the Holistic Unlearning Benchmark (HUB)β€”the first multidimensional evaluation framework tailored for concept unlearning in text-to-image diffusion models. HUB systematically assesses unlearning performance across six dimensions: faithfulness, consistency, precision, multilingual robustness, adversarial robustness, and efficiency. It encompasses 33 target concepts and over 520,000 multilingual prompts, revealing for the first time substantial trade-offs among dimensions and challenging the prevailing single-metric dominance paradigm. Extensive experiments show that no existing method dominates across all six dimensions. We publicly release HUB’s codebase and dataset to foster standardized, reproducible evaluation of trustworthy AI unlearning techniques.

Technology Category

Application Category

πŸ“ Abstract
As text-to-image diffusion models gain widespread commercial applications, there are increasing concerns about unethical or harmful use, including the unauthorized generation of copyrighted or sensitive content. Concept unlearning has emerged as a promising solution to these challenges by removing undesired and harmful information from the pre-trained model. However, the previous evaluations primarily focus on whether target concepts are removed while preserving image quality, neglecting the broader impacts such as unintended side effects. In this work, we propose Holistic Unlearning Benchmark (HUB), a comprehensive framework for evaluating unlearning methods across six key dimensions: faithfulness, alignment, pinpoint-ness, multilingual robustness, attack robustness, and efficiency. Our benchmark covers 33 target concepts, including 16,000 prompts per concept, spanning four categories: Celebrity, Style, Intellectual Property, and NSFW. Our investigation reveals that no single method excels across all evaluation criteria. By releasing our evaluation code and dataset, we hope to inspire further research in this area, leading to more reliable and effective unlearning methods.
Problem

Research questions and friction points this paper is trying to address.

Evaluates unlearning methods for text-to-image diffusion models
Addresses ethical concerns in unauthorized content generation
Assesses broader impacts beyond concept removal and image quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Holistic Unlearning Benchmark for multi-faceted evaluation
Evaluates unlearning across six key dimensions
Covers 33 target concepts with 16,000 prompts
πŸ”Ž Similar Papers
No similar papers found.
Saemi Moon
Saemi Moon
CSE, POSTECH; GSAI, POSTECH
Minjong Lee
Minjong Lee
POSTECH
Machine Learning
S
Sangdon Park
CSE, POSTECH; GSAI, POSTECH
D
Dongwoo Kim
CSE, POSTECH; GSAI, POSTECH