🤖 AI Summary
This work addresses machine unlearning for image classification in deep neural networks (DNNs). We systematically evaluate 18 state-of-the-art unlearning algorithms across multiple models, datasets, and over 100,000 trained models, establishing the first large-scale, multi-initialization, and multi-attack-perspective DNN unlearning benchmark—incorporating population-based membership inference attacks (MIA) and per-sample unlearning-likeness ranking attacks (U-LiRA). Methodologically, we propose Masked Small Gradients (MSG), a gradient-masking-based unlearning technique, and Convolutional Transposition (CT), a novel architecture-aware unlearning mechanism. Our experiments demonstrate that MSG and CT significantly outperform existing methods in unlearning efficacy, accuracy retention, and computational efficiency. We further identify critical weaknesses in mainstream baselines and introduce NG+, a robust, hyperparameter-insensitive baseline that sets a new standard. Finally, we open-source a unified evaluation framework and comprehensive benchmark results to foster reproducible research.
📝 Abstract
Machine unlearning (MU) aims to remove the influence of particular data points from the learnable parameters of a trained machine learning model. This is a crucial capability in light of data privacy requirements, trustworthiness, and safety in deployed models. MU is particularly challenging for deep neural networks (DNNs), such as convolutional nets or vision transformers, as such DNNs tend to memorize a notable portion of their training dataset. Nevertheless, the community lacks a rigorous and multifaceted study that looks into the success of MU methods for DNNs. In this paper, we investigate 18 state-of-the-art MU methods across various benchmark datasets and models, with each evaluation conducted over 10 different initializations, a comprehensive evaluation involving MU over 100K models. We show that, with the proper hyperparameters, Masked Small Gradients (MSG) and Convolution Transpose (CT), consistently perform better in terms of model accuracy and run-time efficiency across different models, datasets, and initializations, assessed by population-based membership inference attacks (MIA) and per-sample unlearning likelihood ratio attacks (U-LiRA). Furthermore, our benchmark highlights the fact that comparing a MU method only with commonly used baselines, such as Gradient Ascent (GA) or Successive Random Relabeling (SRL), is inadequate, and we need better baselines like Negative Gradient Plus (NG+) with proper hyperparameter selection.