Efficient machine unlearning with minimax optimality

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work proposes a statistical unlearning framework under a general loss function to efficiently implement machine unlearning, thereby complying with data deletion regulations and mitigating the impact of contaminated data. Focusing on squared loss, the authors develop Unlearning Least Squares (ULS), which accurately estimates the optimal parameters for the remaining data using only a pre-trained model, the samples to be forgotten, and a small amount of retained data. Theoretical analysis establishes the minimax optimality of ULS, showing that its estimation error decomposes into an ideal term and an “unlearning cost” dependent on the fraction of data removed and model misspecification. This framework enables asymptotically efficient inference without full retraining. Empirical results demonstrate that ULS achieves performance nearly matching full retraining while substantially reducing data access requirements.

Technology Category

Application Category

📝 Abstract

There is a growing demand for efficient data removal to comply with regulations like the GDPR and to mitigate the influence of biased or corrupted data. This has motivated the field of machine unlearning, which aims to eliminate the influence of specific data subsets without the cost of full retraining. In this work, we propose a statistical framework for machine unlearning with generic loss functions and establish theoretical guarantees. For squared loss, especially, we develop Unlearning Least Squares (ULS) and establish its minimax optimality for estimating the model parameter of remaining data when only the pre-trained estimator, forget samples, and a small subsample of the remaining data are available. Our results reveal that the estimation error decomposes into an oracle term and an unlearning cost determined by the forget proportion and the forget model bias. We further establish asymptotically valid inference procedures without requiring full retraining. Numerical experiments and real-data applications demonstrate that the proposed method achieves performance close to retraining while requiring substantially less data access.

Problem

Research questions and friction points this paper is trying to address.

machine unlearning

data removal

minimax optimality

efficient unlearning

GDPR compliance

Innovation

Methods, ideas, or system contributions that make the work stand out.

machine unlearning

minimax optimality

Unlearning Least Squares