Towards Class-wise Fair Adversarial Training via Anti-Bias Soft Label Distillation

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Adversarial training (AT) and adversarial robustness distillation (ARD) suffer from class-wise robustness imbalance: classes that are easier to classify exhibit higher robustness, while harder classes show lower robustness—undermining robust fairness. To address this, we propose Class-Aware Fair Adversarial Training (CA-FAT), whose core innovation lies in establishing a theoretical connection between soft-label smoothness and robust fairness, and designing a class-adaptive temperature-scaling mechanism that dynamically modulates the smoothness of soft labels per class within a knowledge distillation framework, thereby equalizing classification risk across classes. CA-FAT jointly optimizes robustness and fairness by integrating adversarial training, temperature-scaled soft labels, and class-level adaptive smoothing control. Extensive experiments on multiple benchmark datasets demonstrate that CA-FAT significantly improves class-wise robust fairness and achieves superior overall performance compared to state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Adversarial Training (AT) is widely recognized as an effective approach to enhance the adversarial robustness of Deep Neural Networks. As a variant of AT, Adversarial Robustness Distillation (ARD) has shown outstanding performance in enhancing the robustness of small models. However, both AT and ARD face robust fairness issue: these models tend to display strong adversarial robustness against some classes (easy classes) while demonstrating weak adversarial robustness against others (hard classes). This paper explores the underlying factors of this problem and points out the smoothness degree of soft labels for different classes significantly impacts the robust fairness from both empirical observation and theoretical analysis. Based on the above exploration, we propose Anti-Bias Soft Label Distillation (ABSLD) within the Knowledge Distillation framework to enhance the adversarial robust fairness. Specifically, ABSLD adaptively reduces the student's error risk gap between different classes, which is accomplished by adjusting the class-wise smoothness degree of teacher's soft labels during the training process, and the adjustment is managed by assigning varying temperatures to different classes. Additionally, as a label-based approach, ABSLD is highly adaptable and can be integrated with the sample-based methods. Extensive experiments demonstrate ABSLD outperforms state-of-the-art methods on the comprehensive performance of robustness and fairness.

Problem

Research questions and friction points this paper is trying to address.

Addresses robust fairness issue in adversarial training

Explores impact of soft label smoothness on fairness

Proposes Anti-Bias Soft Label Distillation for fairness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Anti-Bias Soft Label Distillation for fairness

Adaptive class-wise smoothness adjustment

Varying temperatures for different classes

🔎 Similar Papers

GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost