🤖 AI Summary
Machine unlearning faces the challenge of removing the influence of specified data without degrading performance on retained data, particularly when forgetting and retain gradients are highly aligned, which can inadvertently increase retain loss. This work proposes ROSU, a novel method that introduces, for the first time, a retain-neutral perturbation constraint. Under a fixed perturbation budget, ROSU generates perturbations orthogonal to the retain gradient via a closed-form solution, ensuring zero first-order change in retain loss while amplifying unlearning efficacy along this neutral direction. Theoretical analysis establishes a curvature-based bound on second-order retain loss, demonstrating that ROSU strictly outperforms standard min-max perturbations in high-gradient-alignment regimes. Experiments show that ROSU significantly improves retain performance in highly coupled scenarios across benchmarks including CIFAR-10/100, Tiny-ImageNet, TOFU, and WMDP, while remaining competitive in other settings.
📝 Abstract
Machine unlearning seeks to remove the influence of designated training data while preserving performance on the remaining data. Approximate unlearning can be viewed as a local editing problem; in min-max unlearning, the key local object is the surrogate point at which the retain objective is evaluated. When forget and retain gradients are strongly aligned, an unconstrained forget-maximizing perturbation can move to a surrogate point that increases retain loss. We propose Retain-Orthogonal Surrogate Unlearning (ROSU), which constrains the inner surrogate construction by maximizing first-order forget gain subject to zero first-order retain change under a fixed perturbation budget. This yields a closed-form retain-orthogonal perturbation, a lightweight transported outer update, and amplification along the retain-neutral direction. Our analysis establishes (i) a curvature-controlled second-order bound on retain damage, (ii) a positive-alignment regime in which ROSU strictly reduces surrogate retain loss relative to standard min-max perturbations, and (iii) near-equivalence when the two gradients are nearly orthogonal. Across vision and language benchmarks (CIFAR-10/100, Tiny-ImageNet, TOFU, WMDP), the empirical pattern follows this geometry: ROSU gives its clearest gains in high-coupling regimes while remaining competitive elsewhere.