🤖 AI Summary
This work proposes a novel “remember to forget” paradigm to address the performance degradation on retained data commonly observed in traditional gradient-ascent-based machine unlearning methods. Instead of directly modifying the original model via unstable gradient ascent, the approach trains an auxiliary model specifically to memorize the data intended for removal and then extrapolates from both this auxiliary model and the original reference model using only gradient descent. By incorporating a prediction consistency constraint, the method enables stable training and rapid convergence without resorting to gradient ascent. Experimental results across multiple tasks demonstrate that the proposed technique significantly enhances unlearning efficacy while effectively preserving the model’s predictive performance on retained data.
📝 Abstract
For ethical and safe AI, machine unlearning rises as a critical topic aiming to protect sensitive, private, and copyrighted knowledge from misuse. To achieve this goal, it is common to conduct gradient ascent (GA) to reverse the training on undesired data. However, such a reversal is prone to catastrophic collapse, which leads to serious performance degradation in general tasks. As a solution, we propose model extrapolation as an alternative to GA, which reaches the counterpart direction in the hypothesis space from one model given another reference model. Therefore, we leverage the original model as the reference, further train it to memorize undesired data while keeping prediction consistency on the rest retained data, to obtain a memorization model. Counterfactual as it might sound, a forget model can be obtained via extrapolation from the memorization model to the reference model. Hence, we avoid directly acquiring the forget model using GA, but proceed with gradient descent for the memorization model, which successfully stabilizes the machine unlearning process. Our model extrapolation is simple and efficient to implement, and it can also effectively converge throughout training to achieve improved unlearning performance.