🤖 AI Summary
Conventional machine unlearning approaches uniformly process all samples slated for removal, incurring prohibitively high computational overhead.
Method: This paper proposes a lightweight unlearning framework based on sample influence screening. It employs influence functions to quantify each training sample’s contribution to model outputs, systematically identifying and excluding low-influence samples prior to unlearning—thereby pruning the target dataset.
Contribution/Results: We provide the first empirical validation that low-influence samples can be safely ignored without degrading unlearning efficacy or downstream model performance. This paradigm shifts away from uniform treatment of all forget-samples, achieving up to 50% reduction in computational cost across both language and vision tasks, while preserving unlearning accuracy and model utility.
📝 Abstract
As concerns around data privacy in machine learning grow, the ability to unlearn, or remove, specific data points from trained models becomes increasingly important. While state of the art unlearning methods have emerged in response, they typically treat all points in the forget set equally. In this work, we challenge this approach by asking whether points that have a negligible impact on the model's learning need to be removed. Through a comparative analysis of influence functions across language and vision tasks, we identify subsets of training data with negligible impact on model outputs. Leveraging this insight, we propose an efficient unlearning framework that reduces the size of datasets before unlearning leading to significant computational savings (up to approximately 50 percent) on real world empirical examples.