ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers

📅 2025-08-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision Transformers (ViTs) exhibit weak transferability under adversarial attacks, and existing ensemble-based attack methods neglect robustness enhancement of surrogate models. Method: This paper proposes the first enhanced ensemble attack framework tailored for ViTs. Its core innovation lies in adversarial model augmentation to improve ensemble quality: we design three ViT-specific augmentation strategies—multi-head dropout, attention score scaling, and MLP feature mixing—integrated with Bayesian optimization for automated hyperparameter tuning, alongside dynamic reweighting and step-size expansion mechanisms. Results: Extensive experiments across diverse ViT architectures demonstrate that our method significantly boosts cross-model transfer attack success rates, consistently outperforming state-of-the-art ensemble attacks. The results empirically validate that model augmentation is critical for enhancing adversarial generalization in ViT-based attacks.

Technology Category

Application Category

📝 Abstract
Ensemble-based attacks have been proven to be effective in enhancing adversarial transferability by aggregating the outputs of models with various architectures. However, existing research primarily focuses on refining ensemble weights or optimizing the ensemble path, overlooking the exploration of ensemble models to enhance the transferability of adversarial attacks. To address this gap, we propose applying adversarial augmentation to the surrogate models, aiming to boost overall generalization of ensemble models and reduce the risk of adversarial overfitting. Meanwhile, observing that ensemble Vision Transformers (ViTs) gain less attention, we propose ViT-EnsembleAttack based on the idea of model adversarial augmentation, the first ensemble-based attack method tailored for ViTs to the best of our knowledge. Our approach generates augmented models for each surrogate ViT using three strategies: Multi-head dropping, Attention score scaling, and MLP feature mixing, with the associated parameters optimized by Bayesian optimization. These adversarially augmented models are ensembled to generate adversarial examples. Furthermore, we introduce Automatic Reweighting and Step Size Enlargement modules to boost transferability. Extensive experiments demonstrate that ViT-EnsembleAttack significantly enhances the adversarial transferability of ensemble-based attacks on ViTs, outperforming existing methods by a substantial margin. Code is available at https://github.com/Trustworthy-AI-Group/TransferAttack.
Problem

Research questions and friction points this paper is trying to address.

Enhancing adversarial transferability in Vision Transformers via ensemble models
Addressing overlooked exploration of ensemble models for stronger attacks
Reducing adversarial overfitting risk through surrogate model augmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial augmentation of surrogate models
Multi-head dropping and attention scaling
Bayesian optimization for parameter tuning
🔎 Similar Papers
No similar papers found.
Hanwen Cao
Hanwen Cao
University of California, San Diego
Robotics
H
Haobo Lu
School of Computer Science and Technology, Huazhong University of Science and Technology
Xiaosen Wang
Xiaosen Wang
Huazhong university of Science and Technology
AI SecurityAI SafetyTrustworthy AIAdversarial Learning
K
Kun He
School of Computer Science and Technology, Huazhong University of Science and Technology