🤖 AI Summary
Existing theoretical frameworks fail to distinguish whether adversarial examples exploit fragile yet predictable non-robust features in data, leading to biased robustness evaluations. This work addresses this gap by formally categorizing adversarial examples into two types: those that rely on non-robust features and those that do not. The authors propose a novel ensemble-based metric to quantify the extent to which adversarial perturbations manipulate non-robust features. By integrating adversarial attack generation with robustness analysis, the proposed framework elucidates the mechanism through which sharpness-aware minimization enhances model robustness and explains the performance discrepancy between standard and adversarial training on robust datasets. This approach offers a refined perspective for evaluating and understanding model robustness.
📝 Abstract
Over the past decade, numerous theories have been proposed to explain the widespread vulnerability of deep neural networks to adversarial evasion attacks. Among these, the theory of non-robust features proposed by Ilyas et al. has been widely accepted, showing that brittle but predictive features of the data distribution can be directly exploited by attackers. However, this theory overlooks adversarial samples that do not directly utilize these features. In this work, we advocate that these two kinds of samples - those which use use brittle but predictive features and those that do not - comprise two types of adversarial weaknesses and should be differentiated when evaluating adversarial robustness. For this purpose, we propose an ensemble-based metric to measure the manipulation of non-robust features by adversarial perturbations and use this metric to analyze the makeup of adversarial samples generated by attackers. This new perspective also allows us to re-examine multiple phenomena, including the impact of sharpness-aware minimization on adversarial robustness and the robustness gap observed between adversarially training and standard training on robust datasets.