🤖 AI Summary
Existing machine unlearning research overlooks significant inter-sample heterogeneity in forgetting difficulty, leading to overestimation of forgetting feasibility. Method: The authors conduct instance-level forgetting performance analysis across mainstream unlearning algorithms (EU, FT, SISA) and diverse benchmarks (CIFAR-10, ImageNet-1K, IMDb), empirically establishing— for the first time—that forgetting difficulty is an intrinsic property of training samples, determined solely by the target model and data distribution, independent of the unlearning algorithm. Using multi-dimensional attribution—including gradient sensitivity, influence functions, prediction entropy, and geometric proximity to decision boundaries—they systematically identify four universal hardness factors (e.g., high-influence samples, low-entropy predictions, boundary-proximal instances), achieving AUC > 0.82 in hardness prediction. Contribution/Results: This work shifts machine unlearning evaluation from coarse-grained aggregate metrics toward fine-grained, sample-aware modeling, providing a new benchmark for robustness assessment and algorithm design.
📝 Abstract
Current research on deep machine unlearning primarily focuses on improving or evaluating the overall effectiveness of unlearning methods while overlooking the varying difficulty of unlearning individual training samples. As a result, the broader feasibility of machine unlearning remains under-explored. This paper studies the cruxes that make machine unlearning difficult through a thorough instance-level unlearning performance analysis over various unlearning algorithms and datasets. In particular, we summarize four factors that make unlearning a data point difficult, and we empirically show that these factors are independent of a specific unlearning algorithm but only relevant to the target model and its training data. Given these findings, we argue that machine unlearning research should pay attention to the instance-level difficulty of unlearning.