🤖 AI Summary
Adversarial training, while enhancing model robustness, exacerbates inter-class robustness imbalance and degrades clean-sample generalization. To address this, we propose a class-aware augmentation labeling mechanism that integrates fine-grained class information into adversarial training, jointly optimizing per-class accuracy and robustness on both clean and adversarial examples. Our method is the first to systematically characterize the “spillover effect” of adversarial training—where robustness gains for some classes inadvertently widen inter-class robustness disparities—and introduces a unified robustness-accuracy evaluation framework. Experiments demonstrate a 53.50% improvement in overall adversarial robustness, a 5.73% reduction in inter-class performance imbalance, and a statistically significant gain in clean-sample accuracy. The approach thus simultaneously advances robustness, generalization, and fairness across classes.
📝 Abstract
Efforts to address declining accuracy as a result of data shifts often involve various data-augmentation strategies. Adversarial training is one such method, designed to improve robustness to worst-case distribution shifts caused by adversarial examples. While this method can improve robustness, it may also hinder generalization to clean examples and exacerbate performance imbalances across different classes. This paper explores the impact of adversarial training on both overall and class-specific performance, as well as its spill-over effects. We observe that enhanced labeling during training boosts adversarial robustness by 53.50% and mitigates class imbalances by 5.73%, leading to improved accuracy in both clean and adversarial settings compared to standard adversarial training.