Boosting Alignment for Post-Unlearning Text-to-Image Generative Models

📅 2024-12-09

🏛️ Neural Information Processing Systems

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Generative image models frequently produce harmful or copyright-infringing content, necessitating efficient machine unlearning. However, existing methods struggle to simultaneously achieve high-quality unlearning and preserve text–image alignment—objectives inherently in tension. This paper proposes the first iterative optimization framework that monotonically improves both unlearning quality and alignment fidelity throughout the unlearning process. Our approach integrates gradient-constrained diffusion optimization, target-class/concept-directed unlearning, dynamic data resampling, and alignment-aware loss design; it further incorporates dataset diversity enhancement and a theoretical characterization of constrained gradient updates. Extensive evaluation on Stable Diffusion demonstrates complete removal of target classes/concepts, significantly lower degradation in FID and CLIP-Score compared to state-of-the-art methods, and text–image alignment fidelity approaching that of the original model.

Technology Category

Application Category

📝 Abstract

Large-scale generative models have shown impressive image-generation capabilities, propelled by massive data. However, this often inadvertently leads to the generation of harmful or inappropriate content and raises copyright concerns. Driven by these concerns, machine unlearning has become crucial to effectively purge undesirable knowledge from models. While existing literature has studied various unlearning techniques, these often suffer from either poor unlearning quality or degradation in text-image alignment after unlearning, due to the competitive nature of these objectives. To address these challenges, we propose a framework that seeks an optimal model update at each unlearning iteration, ensuring monotonic improvement on both objectives. We further derive the characterization of such an update. In addition, we design procedures to strategically diversify the unlearning and remaining datasets to boost performance improvement. Our evaluation demonstrates that our method effectively removes target classes from recent diffusion-based generative models and concepts from stable diffusion models while maintaining close alignment with the models' original trained states, thus outperforming state-of-the-art baselines. Our code will be made available at https://github.com/reds-lab/Restricted_gradient_diversity_unlearning.git.

Problem

Research questions and friction points this paper is trying to address.

Address harmful content generation in text-to-image models

Improve unlearning quality without degrading text-image alignment

Maintain model performance after removing undesirable knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal model update per unlearning iteration

Strategic dataset diversification for performance

Maintains text-image alignment post-unlearning

🔎 Similar Papers

No similar papers found.