Variational Diffusion Unlearning: A Variational Inference Framework for Unlearning in Diffusion Models under Data Constraints

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

In data-constrained settings where full training data is inaccessible, diffusion models struggle to safely forget undesirable content (e.g., violent or pornographic features). To address this, we propose an efficient machine unlearning method grounded in variational inference. Our approach requires only a small set of samples exhibiting the target undesirable features and jointly optimizes a plasticity-inducing term and a stability-regularizing term to steer parameter updates in the pre-trained diffusion model’s latent space. This work is the first to systematically formulate diffusion model unlearning within a variational inference framework. It effectively suppresses generation probability of targeted classes or attributes while preserving overall sample fidelity and diversity. Extensive experiments on MNIST, CIFAR-10, tinyImageNet, and Stable Diffusion demonstrate state-of-the-art performance in both class-level and attribute-level forgetting, achieving strong practicality and cross-dataset generalization.

Technology Category

Application Category

📝 Abstract

For a responsible and safe deployment of diffusion models in various domains, regulating the generated outputs from these models is desirable because such models could generate undesired, violent, and obscene outputs. To tackle this problem, recent works use machine unlearning methodology to forget training data points containing these undesired features from pre-trained generative models. However, these methods proved to be ineffective in data-constrained settings where the whole training dataset is inaccessible. Thus, the principal objective of this work is to propose a machine unlearning methodology that can prevent the generation of outputs containing undesired features from a pre-trained diffusion model in such a data-constrained setting. Our proposed method, termed as Variational Diffusion Unlearning (VDU), is a computationally efficient method that only requires access to a subset of training data containing undesired features. Our approach is inspired by the variational inference framework with the objective of minimizing a loss function consisting of two terms: plasticity inducer and stability regularizer. Plasticity inducer reduces the log-likelihood of the undesired training data points, while the stability regularizer, essential for preventing loss of image generation quality, regularizes the model in parameter space. We validate the effectiveness of our method through comprehensive experiments for both class unlearning and feature unlearning. For class unlearning, we unlearn some user-identified classes from MNIST, CIFAR-10, and tinyImageNet datasets from a pre-trained unconditional denoising diffusion probabilistic model (DDPM). Similarly, for feature unlearning, we unlearn the generation of certain high-level features from a pre-trained Stable Diffusion model

Problem

Research questions and friction points this paper is trying to address.

Proposes machine unlearning for diffusion models under data constraints

Prevents generation of undesired outputs without full training data access

Uses variational inference with plasticity and stability loss terms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Variational inference framework for diffusion model unlearning

Minimizes loss with plasticity inducer and stability regularizer

Requires only undesired data subset under constraints

🔎 Similar Papers

Learning Diffusion Priors from Observations by Expectation Maximization