🤖 AI Summary
Existing transfer-based adversarial attacks neglect explicit modeling of perturbation direction, limiting their black-box transferability. To address this, we propose Residual Perturbation Attack (RPA), the first method to incorporate an exponentially weighted moving average of historical gradients as a reference direction; adversarial perturbations are then constructed from the residual between the current gradient and this reference, guiding adversarial examples toward flatter regions of the loss landscape. This design explicitly models the global evolution trend of perturbations, thereby enhancing cross-model generalization while remaining compatible with mainstream input transformation techniques (e.g., DIM, TIM). Extensive experiments on ImageNet and CIFAR-10 demonstrate that RPA consistently outperforms state-of-the-art baselines—including MI-FGSM, DIM, and TIM—achieving average improvements of 5.2–12.7% in attack success rates across diverse white-box and black-box transfer scenarios. These results validate the critical role of residual-direction modeling in improving transfer robustness.
📝 Abstract
Deep neural networks are susceptible to adversarial examples while suffering from incorrect predictions via imperceptible perturbations. Transfer-based attacks create adversarial examples for surrogate models and transfer these examples to target models under black-box scenarios. Recent studies reveal that adversarial examples in flat loss landscapes exhibit superior transferability to alleviate overfitting on surrogate models. However, the prior arts overlook the influence of perturbation directions, resulting in limited transferability. In this paper, we propose a novel attack method, named Residual Perturbation Attack (ResPA), relying on the residual gradient as the perturbation direction to guide the adversarial examples toward the flat regions of the loss function. Specifically, ResPA conducts an exponential moving average on the input gradients to obtain the first moment as the reference gradient, which encompasses the direction of historical gradients. Instead of heavily relying on the local flatness that stems from the current gradients as the perturbation direction, ResPA further considers the residual between the current gradient and the reference gradient to capture the changes in the global perturbation direction. The experimental results demonstrate the better transferability of ResPA than the existing typical transfer-based attack methods, while the transferability can be further improved by combining ResPA with the current input transformation methods. The code is available at https://github.com/ZezeTao/ResPA.