Subtract the Corruption: Training-Data-Free Corrective Machine Unlearning using Task Arithmetic

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the challenging problem of model debiasing when training data is inaccessible and contaminated samples are unknown. We propose the first “source-free” corrective machine unlearning framework, which requires neither the original training data nor labels for contaminated samples—only a small set of proxy samples. Our method models the contamination effect as an independent task via task arithmetic in weight space: it fine-tunes the model on proxy contaminated samples to obtain a weight-difference vector, calibrates this vector to isolate the contamination bias, and subtracts it from the original model weights. Experiments demonstrate that our approach significantly restores model performance under label noise and achieves near-complete removal of backdoor triggers, while preserving the model’s original functionality. Crucially, it outperforms existing specialized debiasing and defense methods across multiple benchmarks, establishing a new state-of-the-art for source-free model correction.

Technology Category

Application Category

📝 Abstract

Corrupted training data are ubiquitous. Corrective Machine Unlearning (CMU) seeks to remove the influence of such corruption post-training. Prior CMU typically assumes access to identified corrupted training samples (a ``forget set''). However, in many real-world scenarios the training data are no longer accessible. We formalize emph{source-free} CMU, where the original training data are unavailable and, consequently, no forget set of identified corrupted training samples can be specified. Instead, we assume a small proxy (surrogate) set of corrupted samples that reflect the suspected corruption type without needing to be the original training samples. In this stricter setting, methods relying on forget set are ineffective or narrow in scope. We introduce extit{Corrective Unlearning in Task Space} (CUTS), a lightweight weight space correction method guided by the proxy set using task arithmetic principles. CUTS treats the clean and the corruption signal as distinct tasks. Specifically, we briefly fine-tune the corrupted model on the proxy to amplify the corruption mechanism in the weight space, compute the difference between the corrupted and fine-tuned weights as a proxy task vector, and subtract a calibrated multiple of this vector to cancel the corruption. Without access to clean data or a forget set, CUTS recovers a large fraction of the lost utility under label noise and, for backdoor triggers, nearly eliminates the attack with minimal damage to utility, outperforming state-of-the-art specialized CMU methods in source-free setting.

Problem

Research questions and friction points this paper is trying to address.

Removing corruption influence from models without original training data access

Correcting corrupted models using small proxy sets of corrupted samples

Applying task arithmetic to subtract corruption signals from model weights

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses task arithmetic for weight correction

Amplifies corruption via proxy set fine-tuning

Subtracts calibrated task vector to remove corruption

🔎 Similar Papers

NegMerge: Consensual Weight Negation for Strong Machine Unlearning