Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the weak cross-model transferability of adversarial examples against Vision Transformers (ViTs). To overcome the limitations of conventional backward-propagation-based surrogate model optimization, we propose Forward Propagation Refinement (FPR), a novel paradigm operating solely in the forward pass. Our key contributions are: (1) Attention Map Diversification (AMD), which enhances the discriminability and robustness of forward-pass attention distributions; and (2) Momentum Token Embedding (MTE), which stabilizes embedding updates via momentum accumulation and mitigates gradient vanishing—despite requiring no gradient computation. FPR jointly optimizes attention mechanisms and token representations during forward propagation. Evaluated on ViT→CNN and ViT→ViT cross-architecture transfer attacks, it achieves an average 7.0% improvement in attack success rate over state-of-the-art backward-refinement methods. Moreover, FPR is fully compatible with mainstream defenses and transfer techniques.

Technology Category

Application Category

📝 Abstract
Vision Transformers (ViTs) have been widely applied in various computer vision and vision-language tasks. To gain insights into their robustness in practical scenarios, transferable adversarial examples on ViTs have been extensively studied. A typical approach to improving adversarial transferability is by refining the surrogate model. However, existing work on ViTs has restricted their surrogate refinement to backward propagation. In this work, we instead focus on Forward Propagation Refinement (FPR) and specifically refine two key modules of ViTs: attention maps and token embeddings. For attention maps, we propose Attention Map Diversification (AMD), which diversifies certain attention maps and also implicitly imposes beneficial gradient vanishing during backward propagation. For token embeddings, we propose Momentum Token Embedding (MTE), which accumulates historical token embeddings to stabilize the forward updates in both the Attention and MLP blocks. We conduct extensive experiments with adversarial examples transferred from ViTs to various CNNs and ViTs, demonstrating that our FPR outperforms the current best (backward) surrogate refinement by up to 7.0% on average. We also validate its superiority against popular defenses and its compatibility with other transfer methods. Codes and appendix are available at https://github.com/RYC-98/FPR.
Problem

Research questions and friction points this paper is trying to address.

Improving adversarial transferability on Vision Transformers (ViTs).
Refining forward propagation in ViTs' attention maps and token embeddings.
Enhancing robustness against adversarial examples in computer vision tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Forward Propagation Refinement enhances adversarial transferability
Attention Map Diversification improves attention map diversity
Momentum Token Embedding stabilizes forward updates
🔎 Similar Papers
No similar papers found.
Yuchen Ren
Yuchen Ren
Renmin University of China
Zhengyu Zhao
Zhengyu Zhao
Xi'an Jiaotong University, China
Adversarial Machine LearningComputer Vision
C
Chenhao Lin
Xi’an Jiaotong University, China
B
Bo Yang
Information Engineering University, China
L
Lu Zhou
Nanjing University of Aeronautics and Astronautics, China
Z
Zhe Liu
Nanjing University of Aeronautics and Astronautics, China
C
Chao Shen
Xi’an Jiaotong University, China