🤖 AI Summary
This work proposes a compensation-free machine unlearning method for text-to-image diffusion models that precisely removes specific target concepts without degrading the generation quality of unrelated content. Existing approaches often impair overall model utility due to over-deletion and rely on post-hoc compensation mechanisms that fail to fully preserve original capabilities. In contrast, the proposed method minimizes the mutual information between the target concept and model outputs (MiM-MU), effectively erasing sensitive knowledge while strictly maintaining the generative distribution of all other concepts. Notably, it achieves efficient concept removal without requiring model retraining or additional constraints, offering superior preservation of unrelated generation quality compared to current state-of-the-art techniques while thoroughly eliminating designated sensitive content.
📝 Abstract
The powerful generative capabilities of diffusion models have raised growing privacy and safety concerns regarding generating sensitive or undesired content. In response, machine unlearning (MU) -- commonly referred to as concept erasure (CE) in diffusion models -- has been introduced to remove specific knowledge from model parameters meanwhile preserving innocent knowledge. Despite recent advancements, existing unlearning methods often suffer from excessive and indiscriminate removal, which leads to substantial degradation in the quality of innocent generations. To preserve model utility, prior works rely on compensation, i.e., re-assimilating a subset of the remaining data or explicitly constraining the divergence from the pre-trained model on remaining concepts. However, we reveal that generations beyond the compensation scope still suffer, suggesting such post-remedial compensations are inherently insufficient for preserving the general utility of large-scale generative models. Therefore, in this paper, we advocate for developing compensation-free concept erasure operations, which precisely identify and eliminate the undesired knowledge such that the impact on other generations is minimal. In technique, we propose to MiM-MU, which is to unlearn a concept by minimizing the mutual information with a delicate design for computational effectiveness and for maintaining sampling distribution for other concepts. Extensive evaluations demonstrate that our proposed method achieves effective concept removal meanwhile maintaining high-quality generations for other concepts, and remarkably, without relying on any post-remedial compensation for the first time.