🤖 AI Summary
This work addresses the challenge of effectively performing machine unlearning on concentrated distributions of data to be forgotten in large generative models. The authors formulate the unlearning task as a density ratio estimation problem and propose a two-stage inference approach: first, annealing is applied to smooth the high-confidence peaks of the original model, and second, a lightweight classifier is employed to discriminate between retained and forgotten samples, thereby inducing distributional skew. Theoretical analysis provides the first formal justification for the necessity of annealing in unlearning concentrated distributions and establishes finite-sample guarantees on unlearning error. By freezing the main model and fine-tuning only a minimal set of parameters, the method achieves significantly improved unlearning quality and generation utility on the TOFU benchmark, with substantially lower training overhead and runtime compared to existing approaches.
📝 Abstract
We study machine unlearning in large generative models by framing the task as density ratio estimation to a target distribution rather than supervised fine-tuning. While classifier guidance is a standard approach for approximating this ratio and can succeed in general, we show it can fail to faithfully unlearn with finite samples when the forget set represents a sharp, concentrated data distribution. To address this, we introduce Temper-Then-Tilt Unlearning (T3-Unlearning), which freezes the base model and applies a two-step inference procedure: (i) tempering the base distribution to flatten high-confidence spikes, and (ii) tilting the tempered distribution using a lightweight classifier trained to distinguish retain from forget samples. Our theoretical analysis provides finite-sample guarantees linking the surrogate classifier's risk to unlearning error, proving that tempering is necessary to successfully unlearn for concentrated distributions. Empirical evaluations on the TOFU benchmark show that T3-Unlearning improves forget quality and generative utility over existing baselines, while training only a fraction of the parameters with a minimal runtime.