🤖 AI Summary
Existing transfer-based adversarial attack evaluations suffer from two key flaws: (1) unfair intra-class transferability comparisons due to inconsistent hyperparameter configurations, and (2) overly narrow stealth assessment relying on a single metric. This paper introduces the first classification-driven, fairness-aware evaluation framework, pioneering a taxonomy-based attack classification strategy and establishing a multi-dimensional evaluation criterion jointly measuring transferability and source traceability—incorporating ℓ<sub>p</sub> distortion, perceptual imperceptibility, and gradient-based traceability. We conduct a large-scale benchmark on ImageNet, evaluating 23 attacks against 9 defenses. Our analysis reveals a significant negative correlation between transferability and stealth; demonstrates that Diversity Input (DI) outperforms most recent methods under fair settings; and shows that state-of-the-art defenses—including DiffPure—are readily bypassed. These findings challenge several widely held assumptions in the field and expose systematic misjudgments of progress stemming from flawed prior evaluation practices.
📝 Abstract
Transferable adversarial examples raise critical security concerns in real-world, black-box attack scenarios. However, in this work, we identify two main problems in common evaluation practices: (1) For attack transferability, lack of systematic, one-to-one attack comparison and fair hyperparameter settings. (2) For attack stealthiness, simply no comparisons. To address these problems, we establish new evaluation guidelines by (1) proposing a novel attack categorization strategy and conducting systematic and fair intra-category analyses on transferability, and (2) considering diverse imperceptibility metrics and finer-grained stealthiness characteristics from the perspective of attack traceback. To this end, we provide the first large-scale evaluation of transferable adversarial examples on ImageNet, involving 23 representative attacks against 9 representative defenses. Our evaluation leads to a number of new insights, including consensus-challenging ones: (1) Under a fair attack hyperparameter setting, one early attack method, DI, actually outperforms all the follow-up methods. (2) A state-of-the-art defense, DiffPure, actually gives a false sense of (white-box) security since it is indeed largely bypassed by our (black-box) transferable attacks. (3) Even when all attacks are bounded by the same $L_p$ norm, they lead to dramatically different stealthiness performance, which negatively correlates with their transferability performance. Overall, our work demonstrates that existing problematic evaluations have indeed caused misleading conclusions and missing points, and as a result, hindered the assessment of the actual progress in this field.