Improving the Transferability of Adversarial Examples by Inverse Knowledge Distillation

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Weak transferability in black-box adversarial attacks stems from existing methods ignoring architectural disparities between source and target models. To address this, we propose Inverse Knowledge Distillation (IKD), the first approach to reverse the knowledge distillation paradigm for gradient-based attacks: it incorporates a distillation-style loss into standard frameworks (e.g., PGD, MI-FGSM) to enforce model-agnostic regularization at the gradient level, thereby mitigating overfitting to the source model. Additionally, IKD integrates gradient diversity constraints across multiple surrogate models with ensemble-based optimization. Evaluated on ImageNet, IKD significantly enhances cross-architecture transferability—achieving an average 12.7% improvement in attack success rate—and establishes new state-of-the-art performance across 12 mainstream heterogeneous models, demonstrating superior generalization and robustness.

Technology Category

Application Category

📝 Abstract
In recent years, the rapid development of deep neural networks has brought increased attention to the security and robustness of these models. While existing adversarial attack algorithms have demonstrated success in improving adversarial transferability, their performance remains suboptimal due to a lack of consideration for the discrepancies between target and source models. To address this limitation, we propose a novel method, Inverse Knowledge Distillation (IKD), designed to enhance adversarial transferability effectively. IKD introduces a distillation-inspired loss function that seamlessly integrates with gradient-based attack methods, promoting diversity in attack gradients and mitigating overfitting to specific model architectures. By diversifying gradients, IKD enables the generation of adversarial samples with superior generalization capabilities across different models, significantly enhancing their effectiveness in black-box attack scenarios. Extensive experiments on the ImageNet dataset validate the effectiveness of our approach, demonstrating substantial improvements in the transferability and attack success rates of adversarial samples across a wide range of models.
Problem

Research questions and friction points this paper is trying to address.

Enhance adversarial transferability across models
Mitigate overfitting to specific model architectures
Improve black-box attack effectiveness with diversified gradients
Innovation

Methods, ideas, or system contributions that make the work stand out.

Inverse Knowledge Distillation
diversifies attack gradients
enhances black-box attacks
🔎 Similar Papers
No similar papers found.
W
Wenyuan Wu
College of Computer Science, Sichuan University, China
Z
Zheng Liu
Sichuan Newstrong UHD Video Technology Company Ltd., China
Y
Yong Chen
Institute of Optics and Electronics, Chinese Academy of Sciences, China
Chao Su
Chao Su
Beijing Institute of Technology
Natural Language ProcessingMachine Translation
Dezhong Peng
Dezhong Peng
Sichuan University
Multi-modal LearningMultimedia AnalysisNeural Network
X
Xu Wang
College of Computer Science, Sichuan University, China