Erasing Thousands of Concepts: Towards Scalable and Practical Concept Erasure for Text-to-Image Diffusion Models

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the challenge of preventing text-to-image diffusion models from generating inappropriate content, such as copyrighted material, by introducing the ETC framework. Existing concept erasure methods struggle to balance scalability, precision, and robustness, typically supporting only a few hundred concepts. ETC overcomes these limitations by modeling low-rank concept distributions using a Student’s t-mixture model and employing affine optimal transport to accurately locate and erase target concepts. It further incorporates the MoEraser module, which removes target embeddings in the latent space while enhancing robustness against white-box attacks—all without requiring predefined anchor points, thereby preserving non-target content. Evaluated across more than 2,000 diverse concepts and multiple diffusion architectures, ETC achieves state-of-the-art performance, demonstrating, for the first time, efficient erasure at the thousand-concept scale while maintaining high image fidelity.

Technology Category

Application Category

📝 Abstract

Large-scale text-to-image (T2I) diffusion models deliver remarkable visual fidelity but pose safety risks due to their capacity to reproduce undesirable content, such as copyrighted ones. Concept erasure has emerged as a mitigation strategy, yet existing approaches struggle to balance scalability, precision, and robustness, which restricts their applicability to erasing only a few hundred concepts. To address these limitations, we present Erasing Thousands of Concepts (ETC), a scalable framework capable of erasing thousands of concepts while preserving generation quality. Our method first models low-rank concept distributions via a Student's t-distribution Mixture Model (tMM). It enables pin-point erasure of target concepts via affine optimal transport while preserving others by anchoring the boundaries of target concept distributions without pre-defined anchor concepts. We then train a Mixture-of-Experts (MoE)-based module, termed MoEraser, which removes target embeddings while preserving the anchor embeddings. By injecting noise into the text embedding projector and fine-tuning MoEraser for recovery, our framework achieves robustness to white-box attack such as module removal. Extensive experiments on over 2,000 concepts across heterogeneous domains and diffusion models demerate state-of-the-art scalability and precision in large-scale concept erasure.

Problem

Research questions and friction points this paper is trying to address.

concept erasure

text-to-image diffusion models

scalability

safety

undesirable content

Innovation

Methods, ideas, or system contributions that make the work stand out.

concept erasure

diffusion models

Mixture-of-Experts