Classification-Head Bias in Class-Level Machine Unlearning: Diagnosis, Mitigation, and Evaluation

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the underexplored role of classifier head bias in class-level machine unlearning, where such bias can serve as a shortcut for models, obscuring the unlearning mechanism and introducing label leakage risks. The study is the first to systematically reveal that models achieve unlearning by suppressing classifier bias during the process. To mitigate this issue, the authors propose BiasShift—a diagnostic method—and two bias-aware mitigation strategies: Two-Stage Bias Gradient Reversal (TS-BGRM) and Lower-Bound Hinge Regularization (LB-HR). Additionally, they introduce three metrics—Bias Stability Coefficient (BSC), Mean Bias Gradient (MBG), and Mean Bias Shift (MBS)—to quantitatively assess bias stability. Experiments on CIFAR-10/100 and Tiny-ImageNet demonstrate that the proposed approaches significantly enhance bias distribution stability while preserving unlearning efficacy and effectively reducing information leakage risks.

📝 Abstract

Class-level machine unlearning aims to remove the influence of specified classes while preserving model utility on retained classes. Existing methods are commonly evaluated by retain-set accuracy, forget-set accuracy, and unlearning time, but these metrics provide limited insight into how forgetting is achieved internally. In this paper, we reveal a bias-dominated shortcut in class-level unlearning: the prediction of forgotten classes can be suppressed by decreasing the corresponding bias terms in the final classification head. We first analyze the gradient dynamics of classification-head biases under softmax cross-entropy training, explaining why retain-set-only optimization tends to reduce the biases of absent classes. Based on this observation, we introduce BiasShift as a diagnostic baseline, showing that simple bias manipulation can satisfy conventional unlearning metrics while leaving abnormal bias patterns that reveal forgotten labels. To mitigate excessive forgotten-class bias suppression, we propose two bias-aware mechanisms, namely Two-Stage Bias Gradient Reversal Mechanism (TS-BGRM) and Lower-Bound Hinge Regularization (LB-HR). We further introduce three bias-oriented metrics, including Bias Stability Coefficient (BSC), Median Bias Gap (MBG), and Minimal Bias Score (MBS), to quantify bias dependence and potential leakage. Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that the proposed methods maintain competitive unlearning performance while producing more stable bias distributions. We have released our code at {https://github.com/zwd2024/Beyond-the-Shadow-of-Bias-From-Classification-Head-Bias-to-Parameter-Redistribution}.

Problem

Research questions and friction points this paper is trying to address.

machine unlearning

classification-head bias

bias suppression

class-level forgetting

bias leakage

Innovation

Methods, ideas, or system contributions that make the work stand out.

classification-head bias

class-level machine unlearning

bias-aware unlearning