🤖 AI Summary
This work addresses the challenge of effectively unlearning specific data from pretrained models, where complete erasure of memorized information remains difficult. The authors propose an influence-removal-based machine unlearning framework that introduces a novel metric—“relearning convergence delay”—to jointly evaluate residual memory in both weight and prediction spaces. Guided by theoretical analysis, the method integrates weight decay and noise injection mechanisms to significantly attenuate the influence of data targeted for forgetting while preserving model performance on retained data. Experimental results demonstrate that the approach achieves near-ideal unlearning performance across both classification and generative tasks, substantially outperforming existing methods in terms of accuracy on retained data and robustness against relearning attacks.
📝 Abstract
Machine unlearning poses challenges in removing mislabeled, contaminated, or problematic data from a pretrained model. Current unlearning approaches and evaluation metrics are solely focused on model predictions, which limits insight into the model's true underlying data characteristics. To address this issue, we introduce a new metric called relearning convergence delay, which captures both changes in weight space and prediction space, providing a more comprehensive assessment of the model's understanding of the forgotten dataset. This metric can be used to assess the risk of forgotten data being recovered from the unlearned model. Based on this, we propose the Influence Eliminating Unlearning framework, which removes the influence of the forgetting set by degrading its performance and incorporates weight decay and injecting noise into the model's weights, while maintaining accuracy on the retaining set. Extensive experiments show that our method outperforms existing metrics and our proposed relearning convergence delay metric, approaching ideal unlearning performance. We provide theoretical guarantees, including exponential convergence and upper bounds, as well as empirical evidence of strong retention and resistance to relearning in both classification and generative unlearning tasks.