TRIX- Trading Adversarial Fairness via Mixed Adversarial Training

📅 2025-07-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing adversarial training employs uniform attack objectives, ignoring inter-class vulnerability disparities—leading to “adversarial unfairness”: robustness improves for strong classes (with discriminative features) while weak classes (exhibiting feature overlap) remain highly susceptible. This paper proposes TRIX, a novel hybrid adversarial training framework that introduces the first feature-aware mechanism: it adaptively generates weak-targeted adversarial examples for strong classes and strong untargeted examples for weak classes. TRIX further integrates class-level loss weighting with dynamic perturbation magnitude scheduling. Crucially, it achieves these improvements without compromising overall clean accuracy. On CIFAR-10, CIFAR-100, and an ImageNet subset, TRIX significantly boosts worst-class clean and adversarial accuracy under PGD and AutoAttack, substantially narrowing inter-class robustness gaps. It establishes new state-of-the-art performance in fair robustness.

Technology Category

Application Category

📝 Abstract
Adversarial Training (AT) is a widely adopted defense against adversarial examples. However, existing approaches typically apply a uniform training objective across all classes, overlooking disparities in class-wise vulnerability. This results in adversarial unfairness: classes with well distinguishable features (strong classes) tend to become more robust, while classes with overlapping or shared features(weak classes) remain disproportionately susceptible to adversarial attacks. We observe that strong classes do not require strong adversaries during training, as their non-robust features are quickly suppressed. In contrast, weak classes benefit from stronger adversaries to effectively reduce their vulnerabilities. Motivated by this, we introduce TRIX, a feature-aware adversarial training framework that adaptively assigns weaker targeted adversaries to strong classes, promoting feature diversity via uniformly sampled targets, and stronger untargeted adversaries to weak classes, enhancing their focused robustness. TRIX further incorporates per-class loss weighting and perturbation strength adjustments, building on prior work, to emphasize weak classes during the optimization. Comprehensive experiments on standard image classification benchmarks, including evaluations under strong attacks such as PGD and AutoAttack, demonstrate that TRIX significantly improves worst-case class accuracy on both clean and adversarial data, reducing inter-class robustness disparities, and preserves overall accuracy. Our results highlight TRIX as a practical step toward fair and effective adversarial defense.
Problem

Research questions and friction points this paper is trying to address.

Addresses class-wise vulnerability disparities in adversarial training
Reduces adversarial unfairness between strong and weak classes
Improves worst-case class accuracy on clean and adversarial data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive adversarial training for class-wise fairness
Feature-aware mixed targeted and untargeted attacks
Per-class loss weighting and perturbation adjustments
🔎 Similar Papers
No similar papers found.