Seeking Flat Minima over Diverse Surrogates for Improved Adversarial Transferability: A Theoretical Framework and Algorithmic Instantiation

📅 2025-04-23

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Black-box transfer attacks suffer from weak cross-model generalizability of adversarial examples and lack rigorous theoretical foundations. Method: This paper proposes a theory-driven unified attack framework. First, it establishes the first provable theoretical bound on transferability, revealing the joint impact of surrogate–target model discrepancy and adversarial flatness. Second, it designs Diversity-aware Reverse Adversarial Perturbation (DRAP), which jointly optimizes adversarial flatness and model discrepancy over a diverse ensemble of surrogate models. Results: Experiments on NIPS2017 and CIFAR-10 demonstrate that DRAP significantly outperforms state-of-the-art methods against diverse target architectures—including ResNet, VGG, and DenseNet—achieving superior robustness and transferability. Moreover, it provides the first systematic empirical validation of the coupled effects of multiple transfer factors, thereby bridging theory and practice in black-box adversarial transfer attacks.

Technology Category

Application Category

📝 Abstract

The transfer-based black-box adversarial attack setting poses the challenge of crafting an adversarial example (AE) on known surrogate models that remain effective against unseen target models. Due to the practical importance of this task, numerous methods have been proposed to address this challenge. However, most previous methods are heuristically designed and intuitively justified, lacking a theoretical foundation. To bridge this gap, we derive a novel transferability bound that offers provable guarantees for adversarial transferability. Our theoretical analysis has the advantages of extit{(i)} deepening our understanding of previous methods by building a general attack framework and extit{(ii)} providing guidance for designing an effective attack algorithm. Our theoretical results demonstrate that optimizing AEs toward flat minima over the surrogate model set, while controlling the surrogate-target model shift measured by the adversarial model discrepancy, yields a comprehensive guarantee for AE transferability. The results further lead to a general transfer-based attack framework, within which we observe that previous methods consider only partial factors contributing to the transferability. Algorithmically, inspired by our theoretical results, we first elaborately construct the surrogate model set in which models exhibit diverse adversarial vulnerabilities with respect to AEs to narrow an instantiated adversarial model discrepancy. Then, a extit{model-Diversity-compatible Reverse Adversarial Perturbation} (DRAP) is generated to effectively promote the flatness of AEs over diverse surrogate models to improve transferability. Extensive experiments on NIPS2017 and CIFAR-10 datasets against various target models demonstrate the effectiveness of our proposed attack.

Problem

Research questions and friction points this paper is trying to address.

Improving adversarial transferability via flat minima optimization

Theoretical framework for provable adversarial transfer guarantees

Algorithm design for diverse surrogate model vulnerabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizing adversarial examples for flat minima

Controlling surrogate-target model shift

Generating model-diversity-compatible perturbations

🔎 Similar Papers

Enhancing Adversarial Transferability Through Neighborhood Conditional Sampling