🤖 AI Summary
Regulatory frameworks such as the GDPR impose stringent requirements for *machine unlearning*—i.e., provably removing the influence of specific training instances—yet existing methods either lack formal privacy guarantees or incur prohibitive computational overhead.
Method: This paper proposes EUPG, the first framework to integrate *k-anonymity* and *ε-differential privacy* into the machine unlearning pipeline, enabling efficient, instance-level unlearning with rigorous, end-to-end privacy certification. EUPG combines privacy-enhanced data preprocessing with a forgetting mechanism specifically designed for pretrained models.
Contribution/Results: Evaluated on four heterogeneous benchmarks, EUPG achieves unlearning fidelity comparable to full retraining while reducing inference latency by 62% and model storage overhead by 58%. Crucially, it satisfies strict formal privacy constraints—namely, both k-anonymity and ε-differential privacy—throughout the unlearning process. The implementation is publicly available.
📝 Abstract
Privacy protection laws, such as the GDPR, grant individuals the right to request the forgetting of their personal data not only from databases but also from machine learning (ML) models trained on them. Machine unlearning has emerged as a practical means to facilitate model forgetting of data instances seen during training. Although some existing machine unlearning methods guarantee exact forgetting, they are typically costly in computational terms. On the other hand, more affordable methods do not offer forgetting guarantees and are applicable only to specific ML models. In this paper, we present emph{efficient unlearning with privacy guarantees} (EUPG), a novel machine unlearning framework that offers formal privacy guarantees to individuals whose data are being unlearned. EUPG involves pre-training ML models on data protected using privacy models, and it enables {em efficient unlearning with the privacy guarantees offered by the privacy models in use}. Through empirical evaluation on four heterogeneous data sets protected with $k$-anonymity and $ε$-differential privacy as privacy models, our approach demonstrates utility and forgetting effectiveness comparable to those of exact unlearning methods, while significantly reducing computational and storage costs. Our code is available at https://github.com/najeebjebreel/EUPG.