Towards Class-wise Fair Adversarial Training via Anti-Bias Soft Label Distillation

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Adversarial training (AT) and adversarial robustness distillation (ARD) suffer from class-wise robustness imbalance: classes that are easier to classify exhibit higher robustness, while harder classes show lower robustness—undermining robust fairness. To address this, we propose Class-Aware Fair Adversarial Training (CA-FAT), whose core innovation lies in establishing a theoretical connection between soft-label smoothness and robust fairness, and designing a class-adaptive temperature-scaling mechanism that dynamically modulates the smoothness of soft labels per class within a knowledge distillation framework, thereby equalizing classification risk across classes. CA-FAT jointly optimizes robustness and fairness by integrating adversarial training, temperature-scaled soft labels, and class-level adaptive smoothing control. Extensive experiments on multiple benchmark datasets demonstrate that CA-FAT significantly improves class-wise robust fairness and achieves superior overall performance compared to state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Adversarial Training (AT) is widely recognized as an effective approach to enhance the adversarial robustness of Deep Neural Networks. As a variant of AT, Adversarial Robustness Distillation (ARD) has shown outstanding performance in enhancing the robustness of small models. However, both AT and ARD face robust fairness issue: these models tend to display strong adversarial robustness against some classes (easy classes) while demonstrating weak adversarial robustness against others (hard classes). This paper explores the underlying factors of this problem and points out the smoothness degree of soft labels for different classes significantly impacts the robust fairness from both empirical observation and theoretical analysis. Based on the above exploration, we propose Anti-Bias Soft Label Distillation (ABSLD) within the Knowledge Distillation framework to enhance the adversarial robust fairness. Specifically, ABSLD adaptively reduces the student's error risk gap between different classes, which is accomplished by adjusting the class-wise smoothness degree of teacher's soft labels during the training process, and the adjustment is managed by assigning varying temperatures to different classes. Additionally, as a label-based approach, ABSLD is highly adaptable and can be integrated with the sample-based methods. Extensive experiments demonstrate ABSLD outperforms state-of-the-art methods on the comprehensive performance of robustness and fairness.
Problem

Research questions and friction points this paper is trying to address.

Addresses robust fairness issue in adversarial training
Explores impact of soft label smoothness on fairness
Proposes Anti-Bias Soft Label Distillation for fairness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Anti-Bias Soft Label Distillation for fairness
Adaptive class-wise smoothness adjustment
Varying temperatures for different classes
🔎 Similar Papers
No similar papers found.
Shiji Zhao
Shiji Zhao
Beihang University
Machine LearningTrustworthy AIExplainable AIRobust AI
C
Chi Chen
School of Software, Beihang University, No.37, Xueyuan Road, Haidian District, Beijing, 100191, P .R. China
Ranjie Duan
Ranjie Duan
Alibaba Group
AIAI 安全AI推动共同富裕
X
Xizhe Wang
Institute of Artificial Intelligence, Beihang University, No.37, Xueyuan Road, Haidian District, Beijing, 100191, P .R. China
Xingxing Wei
Xingxing Wei
Professor of Artificial Intelligence, Beihang University
Computer visionAdversarial machine learning