🤖 AI Summary
Existing unlearning methods for text-to-image diffusion models often fail to precisely remove target concepts and inadvertently degrade unrelated generative capabilities, falling short of requirements such as copyright compliance. This work proposes SurgUn, the first approach to incorporate retroactive interference theory into diffusion model unlearning. By introducing targeted weight-space updates, SurgUn induces competition between the target concept and newly learned content along shared representational pathways, enabling precise forgetting. The method is compatible with mainstream architectures including U-Net and Diffusion Transformer, and demonstrates consistent efficacy in removing specific visual concepts across Stable Diffusion v1.5, SDXL, and SANA, while significantly preserving general generation capabilities—thereby validating its generality and scalability.
📝 Abstract
Unlearning in text-to-image diffusion models often leads to uneven concept removal and unintended forgetting of unrelated capabilities. This complicates tasks such as copyright compliance, protected data mitigation, artist opt-outs, and policy-driven content updates. As models grow larger and adopt more diverse architectures, achieving precise and selective unlearning while preserving generative quality becomes increasingly challenging. We introduce SurgUn (pronounced as Surgeon), a surgical unlearning method that applies targeted weight-space updates to remove specific visual concepts in text-conditioned diffusion models. Our approach is motivated by retroactive interference theory, which holds that newly acquired memories can overwrite, suppress, or impede access to prior ones by competing for shared representational pathways. We adapt this principle to diffusion models by inducing retroactive concept interference, enabling focused destabilization of only the target concept while preserving unrelated capabilities through a novel training paradigm. SurgUn achieves high-precision unlearning across diverse settings. It performs strongly on compact U-Net based models such as Stable Diffusion v1.5, scales effectively to the larger U-Net architecture SDXL, and extends to SANA, representing an underexplored Diffusion Transformer based architecture for unlearning.