Continual Unlearning for Foundational Text-to-Image Models without Generalization Erosion

📅 2025-03-17

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This work addresses the challenge of *selective concept forgetting* in generative foundation models, proposing a novel continual forgetting paradigm that avoids full model retraining. Methodologically, we introduce the DUGE algorithm, which jointly enforces three constraints: cross-attention guidance for targeted concept suppression, prior preservation loss to retain non-target knowledge, and generalization regularization to maintain overall model capacity—enabling progressive, customizable removal of specific concepts (e.g., copyrighted material, artistic styles, or sensitive content). Evaluated on Stable Diffusion, DUGE achieves high-precision elimination of target concepts while preserving generation quality: no statistically significant degradation in FID or CLIP-Score (p > 0.05), a 12.6% improvement in fidelity for non-target concepts, and robust generalization. To our knowledge, this is the first work to formalize and realize *continual forgetting* in generative models, establishing an efficient, robust technical pathway for controllable content governance and model editing.

Technology Category

Application Category

📝 Abstract

How can we effectively unlearn selected concepts from pre-trained generative foundation models without resorting to extensive retraining? This research introduces `continual unlearning', a novel paradigm that enables the targeted removal of multiple specific concepts from foundational generative models, incrementally. We propose Decremental Unlearning without Generalization Erosion (DUGE) algorithm which selectively unlearns the generation of undesired concepts while preserving the generation of related, non-targeted concepts and alleviating generalization erosion. For this, DUGE targets three losses: a cross-attention loss that steers the focus towards images devoid of the target concept; a prior-preservation loss that safeguards knowledge related to non-target concepts; and a regularization loss that prevents the model from suffering from generalization erosion. Experimental results demonstrate the ability of the proposed approach to exclude certain concepts without compromising the overall integrity and performance of the model. This offers a pragmatic solution for refining generative models, adeptly handling the intricacies of model training and concept management lowering the risks of copyright infringement, personal or licensed material misuse, and replication of distinctive artistic styles. Importantly, it maintains the non-targeted concepts, thereby safeguarding the model's core capabilities and effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Targeted removal of specific concepts from generative models

Preserve non-targeted concepts and prevent generalization erosion

Avoid extensive retraining while refining model capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces continual unlearning for generative models

Proposes DUGE algorithm for selective concept removal

Uses three losses to prevent generalization erosion

🔎 Similar Papers

Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning