🤖 AI Summary
This work addresses the theoretical underpinnings of selective forgetting in machine learning models, a process that remains poorly understood despite its practical importance. By adopting an asymptotic linear stability perspective, the study decomposes data coherence into contributions from the retention set, the forget set, and their interaction, revealing the counterintuitive phenomenon that low signal-to-noise ratio samples are more readily forgotten. Leveraging random matrix theory, Hessian analysis, and a two-layer ReLU CNN signal–noise model—augmented with gradient heatmaps—the authors systematically characterize the coupling between optimization dynamics and data geometry. Theoretical predictions align closely with empirical observations, successfully establishing stability boundaries for gradient-based unlearning methods under batch training, data mixing, and model alignment scenarios.
📝 Abstract
Machine unlearning, the ability to erase the effect of specific training samples without retraining from scratch, is critical for privacy, regulation, and efficiency. However, most progress in unlearning has been empirical, with little theoretical understanding of when and why unlearning works. We tackle this gap by framing unlearning through the lens of asymptotic linear stability to capture the interaction between optimization dynamics and data geometry. The key quantity in our analysis is data coherence which is the cross sample alignment of loss surface directions near the optimum. We decompose coherence along three axes: within the retain set, within the forget set, and between them, and prove tight stability thresholds that separate convergence from divergence. To further link data properties to forgettability, we study a two layer ReLU CNN under a signal plus noise model and show that stronger memorization makes forgetting easier: when the signal to noise ratio (SNR) is lower, cross sample alignment is weaker, reducing coherence and making unlearning easier; conversely, high SNR, highly aligned models resist unlearning. For empirical verification, we show that Hessian tests and CNN heatmaps align closely with the predicted boundary, mapping the stability frontier of gradient based unlearning as a function of batching, mixing, and data/model alignment. Our analysis is grounded in random matrix theory tools and provides the first principled account of the trade offs between memorization, coherence, and unlearning.