🤖 AI Summary
Existing black-box adversarial attack methods implicitly rely on prior knowledge—such as the target model’s training dataset and number of classes—violating the pure black-box assumption and inflating estimates of transferability.
Method: This paper introduces the first systematic framework for prior-agnostic transfer attack evaluation, explicitly identifying and eliminating such priors. It proposes a novel image fusion augmentation strategy to enhance query-based surrogate model training and establishes a rigorous, reproducible, and interpretable pure black-box evaluation paradigm.
Contribution/Results: Experiments demonstrate that prior knowledge substantially overestimates transfer success rates. Under strictly zero-knowledge conditions—i.e., without any access to target model priors—the proposed framework enables robust, verifiable, and query-compatible attack performance assessment. It provides a reliable benchmark for evaluating black-box adversarial robustness, advancing methodological rigor in this domain.
📝 Abstract
Despite their impressive performance, deep visual models are susceptible to transferable black-box adversarial attacks. Principally, these attacks craft perturbations in a target model-agnostic manner. However, surprisingly, we find that existing methods in this domain inadvertently take help from various priors that violate the black-box assumption such as the availability of the dataset used to train the target model, and the knowledge of the number of classes in the target model. Consequently, the literature fails to articulate the true potency of transferable black-box attacks. We provide an empirical study of these biases and propose a framework that aids in a prior-free transparent study of this paradigm. Using our framework, we analyze the role of prior knowledge of the target model data and number of classes in attack performance. We also provide several interesting insights based on our analysis, and demonstrate that priors cause overestimation in transferability scores. Finally, we extend our framework to query-based attacks. This extension inspires a novel image-blending technique to prepare data for effective surrogate model training.