Releasing Inequality Phenomena in L∞-Adversarial Training via Input Gradient Distillation

πŸ“… 2023-05-16
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
β„“βˆž-adversarial training (β„“βˆž-AT) induces imbalanced input gradient distributions, rendering models overly sensitive to perturbations on salient pixels and degrading their generalization robustness against occlusions, in-distribution noise, and natural corruptions (e.g., ImageNet-C). To address this, we propose Input Gradient Distillation (IGD), the first method that explicitly equalizes saliency map distributions via input gradient distillationβ€”thereby theoretically mitigating gradient imbalance. IGD is plug-and-play, parameter-free, and seamlessly integrates with PGD-based adversarial training (PGDAT). Experiments demonstrate that, compared to standard β„“βˆž-AT, IGD reduces error rates by 16.53% on occlusion benchmarks, 60% on in-distribution noise, and 21.11% on ImageNet-C, while fully preserving the original β„“βˆž-adversarial accuracy. This yields substantial improvements in robust generalization without compromising worst-case adversarial robustness.
πŸ“ Abstract
Since adversarial examples appeared and showed the catastrophic degradation they brought to DNN, many adversarial defense methods have been devised, among which adversarial training is considered the most effective. However, a recent work showed the inequality phenomena in $l_{infty}$-adversarial training and revealed that the $l_{infty}$-adversarially trained model is vulnerable when a few important pixels are perturbed by i.i.d. noise or occluded. In this paper, we propose a simple yet effective method called Input Gradient Distillation (IGD) to release the inequality phenomena in $l_{infty}$-adversarial training. Experiments show that while preserving the model's adversarial robustness, compared to PGDAT, IGD decreases the $l_{infty}$-adversarially trained model's error rate to inductive noise and inductive occlusion by up to 60% and 16.53%, and to noisy images in Imagenet-C by up to 21.11%. Moreover, we formally explain why the equality of the model's saliency map can improve such robustness.
Problem

Research questions and friction points this paper is trying to address.

Address uneven input gradients in l∞-norm adversarial training
Mitigate vulnerability to high-attribution pixel perturbations
Improve robustness against inductive and random noise attacks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Input Gradient Distillation for inequality mitigation
Aligns student and teacher model gradients
Improves robustness via Cosine Similarity alignment
πŸ”Ž Similar Papers
No similar papers found.
Junxi Chen
Junxi Chen
SUN YAT-SEN UNIVERSITY
Deep learning
J
Junhao Dong
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; Guangdong Province Key Laboratory of Information Security Technology, Guangzhou, China; Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Guangzhou, China
X
Xiaohua Xie
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China; Guangdong Province Key Laboratory of Information Security Technology, Guangzhou, China; Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Guangzhou, China