🤖 AI Summary
This paper addresses the inherent trade-off between *exploitation* (maximizing attack success rate on the target model) and *exploration* (enhancing cross-model transferability) in adversarial attacks. To reconcile this tension, we propose Gradient-Guided Sampling (GGS), a novel method integrated within the MI-FGSM framework. GGS leverages the gradient direction from prior iterations to guide perturbation sampling, while modulating the sampling magnitude via a controllable stochastic distribution—thereby dynamically balancing local loss minimization and global exploration. Crucially, it jointly optimizes both the magnitude of the loss value and the flatness of the loss landscape in a single attack pass, without requiring auxiliary models or pretraining. Extensive experiments demonstrate that GGS achieves state-of-the-art transferability across diverse CNN architectures and multimodal large language models, yielding average improvements of 5.2–12.7 percentage points in cross-model attack success rates.
📝 Abstract
Adversarial attacks present a critical challenge to deep neural networks' robustness, particularly in transfer scenarios across different model architectures. However, the transferability of adversarial attacks faces a fundamental dilemma between Exploitation (maximizing attack potency) and Exploration (enhancing cross-model generalization). Traditional momentum-based methods over-prioritize Exploitation, i.e., higher loss maxima for attack potency but weakened generalization (narrow loss surface). Conversely, recent methods with inner-iteration sampling over-prioritize Exploration, i.e., flatter loss surfaces for cross-model generalization but weakened attack potency (suboptimal local maxima). To resolve this dilemma, we propose a simple yet effective Gradient-Guided Sampling (GGS), which harmonizes both objectives through guiding sampling along the gradient ascent direction to improve both sampling efficiency and stability. Specifically, based on MI-FGSM, GGS introduces inner-iteration random sampling and guides the sampling direction using the gradient from the previous inner-iteration (the sampling's magnitude is determined by a random distribution). This mechanism encourages adversarial examples to reside in balanced regions with both flatness for cross-model generalization and higher local maxima for strong attack potency. Comprehensive experiments across multiple DNN architectures and multimodal large language models (MLLMs) demonstrate the superiority of our method over state-of-the-art transfer attacks. Code is made available at https://github.com/anuin-cat/GGS.