🤖 AI Summary
This work addresses the challenge of machine unlearning in the presence of dataset bias, where models tend to learn spurious correlations, rendering conventional unlearning methods ineffective—particularly for “shortcut” biased samples that are easy to learn yet hard to forget. To tackle this issue, the authors propose the CUPID framework, which leverages differences in the sharpness of sample loss landscapes to partition the forgetting set into causal and bias subsets. CUPID decouples model parameters into distinct causal and bias pathways and applies targeted gradient updates accordingly. Notably, this is the first approach to utilize loss sharpness for both sample partitioning and parameter disentanglement, effectively overcoming the shortcut forgetting problem. Extensive experiments on Waterbirds, BAR, and Biased NICO++ demonstrate that CUPID substantially outperforms existing unlearning methods, achieving state-of-the-art performance.
📝 Abstract
Machine unlearning, which enables a model to forget specific data, is crucial for ensuring data privacy and model reliability. However, its effectiveness can be severely undermined in real-world scenarios where models learn unintended biases from spurious correlations within the data. This paper investigates the unique challenges of unlearning from such biased models. We identify a novel phenomenon we term ``shortcut unlearning," where models exhibit an ``easy to learn, yet hard to forget" tendency. Specifically, models struggle to forget easily-learned, bias-aligned samples; instead of forgetting the class attribute, they unlearn the bias attribute, which can paradoxically improve accuracy on the class intended to be forgotten. To address this, we propose CUPID, a new unlearning framework inspired by the observation that samples with different biases exhibit distinct loss landscape sharpness. Our method first partitions the forget set into causal- and bias-approximated subsets based on sample sharpness, then disentangles model parameters into causal and bias pathways, and finally performs a targeted update by routing refined causal and bias gradients to their respective pathways. Extensive experiments on biased datasets including Waterbirds, BAR, and Biased NICO++ demonstrate that our method achieves state-of-the-art forgetting performance and effectively mitigates the shortcut unlearning problem.