🤖 AI Summary
This work addresses key limitations of existing data unlearning methods for machine learning models: reliance on Hessian computation, restriction to convex empirical risk minimization (ERM), and poor scalability to high-dimensional, non-convex, over-parameterized settings. We propose an efficient, provably correct online unlearning algorithm. Its core innovation is an affine stochastic recursion-based mechanism for maintaining statistical vectors that implicitly encode second-order information, enabling Newton-style online updates without explicit Hessian computation. The method operates directly on non-convergent training trajectories and provides formal unlearning certification. It reduces time and memory overhead by several orders of magnitude, achieves millisecond-scale deletion latency, and—while improving test accuracy—establishes the first theoretical bounds linking deletion capacity to generalization error.
📝 Abstract
Machine unlearning strives to uphold the data owners' right to be forgotten by enabling models to selectively forget specific data. Recent advances suggest pre-computing and storing statistics extracted from second-order information and implementing unlearning through Newton-style updates. However, the Hessian matrix operations are extremely costly and previous works conduct unlearning for empirical risk minimizer with the convexity assumption, precluding their applicability to high-dimensional over-parameterized models and the nonconvergence condition. In this paper, we propose an efficient Hessian-free unlearning approach. The key idea is to maintain a statistical vector for each training data, computed through affine stochastic recursion of the difference between the retrained and learned models. We prove that our proposed method outperforms the state-of-the-art methods in terms of the unlearning and generalization guarantees, the deletion capacity, and the time/storage complexity, under the same regularity conditions. Through the strategy of recollecting statistics for removing data, we develop an online unlearning algorithm that achieves near-instantaneous data removal, as it requires only vector addition. Experiments demonstrate that our proposed scheme surpasses existing results by orders of magnitude in terms of time/storage costs with millisecond-level unlearning execution, while also enhancing test accuracy.