Hessian-Free Online Certified Unlearning

📅 2024-04-02

📈 Citations: 1

✨ Influential: 1

career value

161K/year

🤖 AI Summary

This work addresses key limitations of existing data unlearning methods for machine learning models: reliance on Hessian computation, restriction to convex empirical risk minimization (ERM), and poor scalability to high-dimensional, non-convex, over-parameterized settings. We propose an efficient, provably correct online unlearning algorithm. Its core innovation is an affine stochastic recursion-based mechanism for maintaining statistical vectors that implicitly encode second-order information, enabling Newton-style online updates without explicit Hessian computation. The method operates directly on non-convergent training trajectories and provides formal unlearning certification. It reduces time and memory overhead by several orders of magnitude, achieves millisecond-scale deletion latency, and—while improving test accuracy—establishes the first theoretical bounds linking deletion capacity to generalization error.

Technology Category

Application Category

📝 Abstract

Machine unlearning strives to uphold the data owners' right to be forgotten by enabling models to selectively forget specific data. Recent advances suggest pre-computing and storing statistics extracted from second-order information and implementing unlearning through Newton-style updates. However, the Hessian matrix operations are extremely costly and previous works conduct unlearning for empirical risk minimizer with the convexity assumption, precluding their applicability to high-dimensional over-parameterized models and the nonconvergence condition. In this paper, we propose an efficient Hessian-free unlearning approach. The key idea is to maintain a statistical vector for each training data, computed through affine stochastic recursion of the difference between the retrained and learned models. We prove that our proposed method outperforms the state-of-the-art methods in terms of the unlearning and generalization guarantees, the deletion capacity, and the time/storage complexity, under the same regularity conditions. Through the strategy of recollecting statistics for removing data, we develop an online unlearning algorithm that achieves near-instantaneous data removal, as it requires only vector addition. Experiments demonstrate that our proposed scheme surpasses existing results by orders of magnitude in terms of time/storage costs with millisecond-level unlearning execution, while also enhancing test accuracy.

Problem

Research questions and friction points this paper is trying to address.

Enables models to forget specific data efficiently

Reduces Hessian matrix operation costs significantly

Improves unlearning speed and storage complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hessian-free unlearning approach

Affine stochastic recursion technique

Online unlearning algorithm efficiency

🔎 Similar Papers

Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models