Mass Concept Erasure in Diffusion Models with Concept Hierarchy

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 7

✨ Influential: 3

career value

209K/year

🤖 AI Summary

Existing diffusion models suffer from low efficiency and degraded generation quality when performing large-scale concept erasure. To address this, this work proposes SuPLoRA, a method that groups semantically related concepts into a supertype–subtype hierarchy to enable parameter-shared, group-wise erasure. By encoding supertype information into the frozen down-projection matrix and fine-tuning only the up-projection matrix—augmented with diffusion regularization to preserve denoising capabilities in unmasked regions—the approach effectively mitigates generative degradation of supertypes caused by erasing multiple subtypes. Evaluated on a large-scale benchmark encompassing celebrities, objects, and explicit content, SuPLoRA significantly improves erasure efficiency while maintaining high-fidelity generation.

Technology Category

Application Category

📝 Abstract

The success of diffusion models has raised concerns about the generation of unsafe or harmful content, prompting concept erasure approaches that fine-tune modules to suppress specific concepts while preserving general generative capabilities. However, as the number of erased concepts grows, these methods often become inefficient and ineffective, since each concept requires a separate set of fine-tuned parameters and may degrade the overall generation quality. In this work, we propose a supertype-subtype concept hierarchy that organizes erased concepts into a parent-child structure. Each erased concept is treated as a child node, and semantically related concepts (e.g., macaw, and bald eagle) are grouped under a shared parent node, referred to as a supertype concept (e.g., bird). Rather than erasing concepts individually, we introduce an effective and efficient group-wise suppression method, where semantically similar concepts are grouped and erased jointly by sharing a single set of learnable parameters. During the erasure phase, standard diffusion regularization is applied to preserve denoising process in unmasked regions. To mitigate the degradation of supertype generation caused by excessive erasure of semantically related subtypes, we propose a novel method called Supertype-Preserving Low-Rank Adaptation (SuPLoRA), which encodes the supertype concept information in the frozen down-projection matrix and updates only the up-projection matrix during erasure. Theoretical analysis demonstrates the effectiveness of SuPLoRA in mitigating generation performance degradation. We construct a more challenging benchmark that requires simultaneous erasure of concepts across diverse domains, including celebrities, objects, and pornographic content.

Problem

Research questions and friction points this paper is trying to address.

concept erasure

diffusion models

mass concept removal

generation quality degradation

unsafe content

Innovation

Methods, ideas, or system contributions that make the work stand out.

concept erasure

diffusion models

concept hierarchy