๐ค AI Summary
This work proposes a novel machine unlearning framework grounded in the principle of distributional indistinguishability, which is introduced for the first time as an explicit unlearning objective. To mitigate the degradation of model generalization commonly caused by unstable optimization in existing approximate unlearning methods, the approach leverages an independent holdout dataset to provide class-conditional reference signals. Through reference-guided distillation and distribution alignment, the modelโs outputs on forgotten data are steered to resemble its responses to unseen data. Extensive experiments across diverse model architectures, natural image datasets, and varying forgetting ratios demonstrate that the proposed method consistently outperforms current baselines, effectively removing the influence of targeted data while better preserving overall model performance.
๐ Abstract
Machine unlearning aims to remove the influence of specific data from trained models while preserving general utility. Existing approximate unlearning methods often rely on performance-degradation heuristics, such as loss maximization or random labeling. However, these signals can be poorly conditioned, leading to unstable optimization and harming the model's generalization. We argue that unlearning should instead prioritize distributional indistinguishability, aligning the model's behavior on forget data with its behavior on truly unseen data. Motivated by this, we propose Reference-Guided Unlearning (ReGUn), a framework that leverages a disjoint held-out dataset to provide a principled, class-conditioned reference for distillation. We demonstrate across various model architectures, natural image datasets, and varying forget fractions that ReGUn consistently outperforms standard approximate baselines, achieving a superior forgetting-utility trade-off.