Enhancing Adversarial Transferability with Adversarial Weight Tuning

📅 2024-08-18
🏛️ AAAI Conference on Artificial Intelligence
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Adversarial examples (AEs) exhibit weak transferability across models in black-box attacks. Method: This paper proposes the first generalization-based unified metric for adversarial transferability, revealing the synergistic enhancement mechanism between model smoothness and flat local maxima. We theoretically establish their interrelationship and design an Adaptive Weight Tuning (AWT) method—requiring no additional data—that jointly optimizes gradients, enforces smoothness regularization, and enhances flat local maxima. Contribution/Results: Evaluated on ImageNet, our approach significantly improves transfer attack success rates, outperforming state-of-the-art methods by an average of 5% on CNN-based target models and 10% on Transformer-based targets. These results validate the method’s strong generalization capability and practical effectiveness in cross-architecture black-box attacks.

Technology Category

Application Category

📝 Abstract
Deep neural networks (DNNs) are vulnerable to adversarial examples (AEs) that mislead the model while appearing benign to human observers. A critical concern is the transferability of AEs, which enables black-box attacks without direct access to the target model. However, many previous attacks have failed to explain the intrinsic mechanism of adversarial transferability, lacking a unified and representative metric for transferability as well. In this paper, we rethink the property of transferable AEs and develop a novel metric to measure transferability from the perspective of generalization. Building on insights from this metric, we analyze the generalization of AEs across models with different architectures and prove that we can find a local perturbation to mitigate the gap between surrogate and target models. We further establish the inner connections between model smoothness and flat local maxima, both of which contribute to the transferability of AEs. Further, we propose a new adversarial attack algorithm, Adversarial Weight Tuning (AWT), which adaptively adjusts the parameters of the surrogate model using generated AEs to optimize the flat local maxima and model smoothness simultaneously, without the need for extra data. AWT is a data-free tuning method that combines gradient-based and model-related attack methods to enhance the transferability of AEs. Extensive experiments on a variety of models with different architectures on ImageNet demonstrate that AWT yields superior performance over other attacks, with an average increase of nearly 5% and 10% attack success rates on CNN-based and Transformer-based models, respectively, compared to state-of-the-art attacks.
Problem

Research questions and friction points this paper is trying to address.

Explaining the intrinsic mechanism of adversarial example transferability across models
Mitigating the performance gap between surrogate and target models
Enhancing adversarial transferability through adaptive model parameter tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptively tunes surrogate model parameters
Optimizes flat local maxima and model smoothness
Combines gradient-based and model-based attack methods
🔎 Similar Papers
No similar papers found.
J
Jiahao Chen
College of Computer Science and Technology, Zhejiang University
Zhou Feng
Zhou Feng
Zhejiang University
AI Security
R
Rui Zeng
College of Computer Science and Technology, Zhejiang University
Y
Yuwen Pu
College of Computer Science and Technology, Zhejiang University
Chunyi Zhou
Chunyi Zhou
Zhejiang University
Cyberspace SecurityMachine Learning PrivacyFederated Learning
Y
Yi Jiang
College of Computer Science and Technology, Zhejiang University
Y
Yuyou Gan
College of Computer Science and Technology, Zhejiang University
Jinbao Li
Jinbao Li
Department of Geography, University of Hong Kong
Climate ChangePaleoclimateENSODroughtDendrochronology
Shouling Ji
Shouling Ji
Professor, Zhejiang University & Georgia Institute of Technology
Data-driven SecurityAI SecuritySoftware ScurityPrivacy