Transferable Backdoor Attacks for Code Models via Sharpness-Aware Adversarial Perturbation

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the longstanding trade-off between transferability and stealthiness in backdoor attacks against code models: static triggers are easily detectable, while dynamic triggers rely on unrealistic same-distribution assumptions and exhibit poor transferability. To overcome this limitation, we propose STAB, the first backdoor attack that leverages loss landscape flatness to generate context-aware adversarial triggers within flat regions, achieving both high stealthiness and strong cross-dataset transferability. Our method trains a surrogate model using Sharpness-Aware Minimization and employs Gumbel-Softmax for differentiable search of discrete triggers. Experiments demonstrate that STAB attains an average attack success rate of 73.2% under defenses across three datasets and two code models, outperforming the best dynamic attack by 12.4% in cross-dataset settings, all while preserving clean-sample accuracy.

Technology Category

Application Category

📝 Abstract
Code models are increasingly adopted in software development but remain vulnerable to backdoor attacks via poisoned training data. Existing backdoor attacks on code models face a fundamental trade-off between transferability and stealthiness. Static trigger-based attacks insert fixed dead code patterns that transfer well across models and datasets but are easily detected by code-specific defenses. In contrast, dynamic trigger-based attacks adaptively generate context-aware triggers to evade detection but suffer from poor cross-dataset transferability. Moreover, they rely on unrealistic assumptions of identical data distributions between poisoned and victim training data, limiting their practicality. To overcome these limitations, we propose Sharpness-aware Transferable Adversarial Backdoor (STAB), a novel attack that achieves both transferability and stealthiness without requiring complete victim data. STAB is motivated by the observation that adversarial perturbations in flat regions of the loss landscape transfer more effectively across datasets than those in sharp minima. To this end, we train a surrogate model using Sharpness-Aware Minimization to guide model parameters toward flat loss regions, and employ Gumbel-Softmax optimization to enable differentiable search over discrete trigger tokens for generating context-aware adversarial triggers. Experiments across three datasets and two code models show that STAB outperforms prior attacks in terms of transferability and stealthiness. It achieves a 73.2% average attack success rate after defense, outperforming static trigger-based attacks that fail under defense. STAB also surpasses the best dynamic trigger-based attack by 12.4% in cross-dataset attack success rate and maintains performance on clean inputs.
Problem

Research questions and friction points this paper is trying to address.

backdoor attacks
code models
transferability
stealthiness
adversarial perturbation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sharpness-Aware Minimization
Transferable Backdoor Attack
Adversarial Perturbation
Gumbel-Softmax Optimization
Code Models
🔎 Similar Papers
No similar papers found.
Shuyu Chang
Shuyu Chang
Nanjing University of Posts and Telecommunications
AI and SecurityText Mining
Haiping Huang
Haiping Huang
Nanjing University of Posts and Communications
Internet of ThingsBlockchainData security
Yanjun Zhang
Yanjun Zhang
Lecturer, University of Technology Sydney
Security and PrivacyMachine Learning
Yujin Huang
Yujin Huang
University of Melbourne
Trustworthy On-device AISoftware Security
F
Fu Xiao
School of Computer Science, Nanjing University of Posts and Telecommunications, China; State Key Laboratory of Tibetan Intelligence, China; Jiangsu Provincial Key Laboratory of Internet of Things Intelligent Perception and Computing, China
L
Leo Yu Zhang
Griffith University, Australia