Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement

📅 2025-03-19

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This paper addresses the weak cross-model transferability of adversarial examples against Vision Transformers (ViTs). To overcome the limitations of conventional backward-propagation-based surrogate model optimization, we propose Forward Propagation Refinement (FPR), a novel paradigm operating solely in the forward pass. Our key contributions are: (1) Attention Map Diversification (AMD), which enhances the discriminability and robustness of forward-pass attention distributions; and (2) Momentum Token Embedding (MTE), which stabilizes embedding updates via momentum accumulation and mitigates gradient vanishing—despite requiring no gradient computation. FPR jointly optimizes attention mechanisms and token representations during forward propagation. Evaluated on ViT→CNN and ViT→ViT cross-architecture transfer attacks, it achieves an average 7.0% improvement in attack success rate over state-of-the-art backward-refinement methods. Moreover, FPR is fully compatible with mainstream defenses and transfer techniques.

Technology Category

Application Category

📝 Abstract

Vision Transformers (ViTs) have been widely applied in various computer vision and vision-language tasks. To gain insights into their robustness in practical scenarios, transferable adversarial examples on ViTs have been extensively studied. A typical approach to improving adversarial transferability is by refining the surrogate model. However, existing work on ViTs has restricted their surrogate refinement to backward propagation. In this work, we instead focus on Forward Propagation Refinement (FPR) and specifically refine two key modules of ViTs: attention maps and token embeddings. For attention maps, we propose Attention Map Diversification (AMD), which diversifies certain attention maps and also implicitly imposes beneficial gradient vanishing during backward propagation. For token embeddings, we propose Momentum Token Embedding (MTE), which accumulates historical token embeddings to stabilize the forward updates in both the Attention and MLP blocks. We conduct extensive experiments with adversarial examples transferred from ViTs to various CNNs and ViTs, demonstrating that our FPR outperforms the current best (backward) surrogate refinement by up to 7.0% on average. We also validate its superiority against popular defenses and its compatibility with other transfer methods. Codes and appendix are available at https://github.com/RYC-98/FPR.

Problem

Research questions and friction points this paper is trying to address.

Improving adversarial transferability on Vision Transformers (ViTs).

Refining forward propagation in ViTs' attention maps and token embeddings.

Enhancing robustness against adversarial examples in computer vision tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Forward Propagation Refinement enhances adversarial transferability

Attention Map Diversification improves attention map diversity

Momentum Token Embedding stabilizes forward updates

🔎 Similar Papers

No similar papers found.