AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples

📅 2024-04-30

🏛️ AAAI Conference on Artificial Intelligence

📈 Citations: 6

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing adversarial attack evaluations suffer from optimistic bias and irreproducibility due to inconsistent perturbation budgets and non-uniform benchmarks. Method: We propose the first fair and reproducible gradient-based attack evaluation framework, introducing an optimality metric based on multi-attack ensemble estimation, enabling standardized cross-algorithm and cross-model-library assessment under strictly fixed forward/backward query budgets. Contribution/Results: We systematically evaluate over 800 attack configurations on CIFAR-10 and ImageNet, covering more than 100 mainstream implementations. Results reveal that only a few methods exhibit consistent superiority across diverse settings. To foster transparency and rigor, we open-source a benchmarking platform and a dynamic leaderboard—establishing a reliable infrastructure for adversarial robustness research.

Technology Category

Application Category

📝 Abstract

While novel gradient-based attacks are continuously proposed to improve the optimization of adversarial examples, each is shown to outperform its predecessors using different experimental setups, implementations, and computational budgets, leading to biased and unfair comparisons. In this work, we overcome this issue by proposing AttackBench, i.e., an attack evaluation framework that evaluates the effectiveness of each attack (along with its different library implementations) under the same maximum available computational budget. To this end, we (i) define a novel optimality metric that quantifies how close each attack is to the optimal solution (empirically estimated by ensembling all attacks), and (ii) limit the maximum number of forward and backward queries that each attack can execute on the target model. Our extensive experimental analysis compares more than 100 attack implementations over 800 different configurations, considering both CIFAR-10 and ImageNet models, and shows that only a few attack implementations outperform all the remaining approaches. These findings suggest that novel defenses should be evaluated against different attacks than those normally used in the literature to avoid overly-optimistic robustness evaluations. We release AttackBench as a publicly-available benchmark that will be continuously updated with new attack implementations to maintain an up-to-date ranking of the best gradient-based attacks. We release AttackBench as a publicly available benchmark, including a continuously updated leaderboard and source code to maintain an up-to-date ranking of the best gradient-based attacks.

Problem

Research questions and friction points this paper is trying to address.

Unfair evaluation of gradient-based adversarial attacks due to varied experimental setups

Lack of standardized framework for comparing attack effectiveness and efficiency

Implementation issues hindering optimal performance of many adversarial attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes AttackBench for fair attack comparison

Introduces optimality metric for attack effectiveness

Limits query budget for efficiency evaluation

🔎 Similar Papers

A Survey and Evaluation of Adversarial Attacks for Object Detection