🤖 AI Summary
Existing data-free robust distillation methods (e.g., DFRD) neglect inter-class robustness fairness, resulting in significant disparities in student model robustness across classes and attack types. To address this, we propose the first data-free distillation framework explicitly designed to achieve class-wise robustness equity. Our method comprises two key components: (1) a robustness-guided class reweighting strategy that dynamically increases the sampling weight for vulnerable classes during adversarial example generation; and (2) a fairness-aware adversarial generation mechanism that jointly optimizes uniformity constraints on feature-layer predictions and unified target attacks to produce more balanced target adversarial examples. Evaluated on CIFAR-10, our approach improves the worst-class robust accuracy of a MobileNet-V2 student model by 15.1% under FGSM and 6.4% under AutoAttack, respectively—demonstrating substantial gains in both robustness fairness and stability.
📝 Abstract
Data-Free Robustness Distillation (DFRD) aims to transfer the robustness from the teacher to the student without accessing the training data. While existing methods focus on overall robustness, they overlook the robust fairness issues, leading to severe disparity of robustness across different categories. In this paper, we find two key problems: (1) student model distilled with equal class proportion data behaves significantly different across distinct categories; and (2) the robustness of student model is not stable across different attacks target. To bridge these gaps, we present the first Fairness-Enhanced data-free Robustness Distillation (FERD) framework to adjust the proportion and distribution of adversarial examples. For the proportion, FERD adopts a robustness-guided class reweighting strategy to synthesize more samples for the less robust categories, thereby improving robustness of them. For the distribution, FERD generates complementary data samples for advanced robustness distillation. It generates Fairness-Aware Examples (FAEs) by enforcing a uniformity constraint on feature-level predictions, which suppress the dominance of class-specific non-robust features, providing a more balanced representation across all categories. Then, FERD constructs Uniform-Target Adversarial Examples (UTAEs) from FAEs by applying a uniform target class constraint to avoid biased attack directions, which distribute the attack targets across all categories and prevents overfitting to specific vulnerable categories. Extensive experiments on three public datasets show that FERD achieves state-of-the-art worst-class robustness under all adversarial attack (e.g., the worst-class robustness under FGSM and AutoAttack are improved by 15.1% and 6.4% using MobileNet-V2 on CIFAR-10), demonstrating superior performance in both robustness and fairness aspects.