Semantically-Equivalent Transformations-Based Backdoor Attacks against Neural Code Models: Characterization and Mitigation

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Existing neural code model backdoor defenses primarily target cleanable, injection-based attacks, leading to false security assurances. This work introduces, for the first time, a novel backdoor paradigm grounded in Semantic-Equivalent Transformations (SET): it generates stealthy triggers via infrequent yet semantically invariant code rewrites, effectively evading mainstream defenses. We propose the first automated framework for SET-trigger generation, integrating fine-grained code semantic analysis with formal program transformation rule modeling. Evaluated across CodeBERT, CodeT5, and StarCoder on five downstream tasks and six programming languages, our attacks achieve >90% success rates while reducing detection rates by an average of 25.13%; conventional normalization-based mitigation strategies prove only partially effective. This study uncovers a previously overlooked threat dimension—semantics-preserving backdoors—and substantially expands the frontier of adversarial vulnerability in code intelligence models.

Technology Category

Application Category

📝 Abstract

Neural code models have been increasingly incorporated into software development processes. However, their susceptibility to backdoor attacks presents a significant security risk. The state-of-the-art understanding focuses on injection-based attacks, which insert anomalous patterns into software code. These attacks can be neutralized by standard sanitization techniques. This status quo may lead to a false sense of security regarding backdoor attacks. In this paper, we introduce a new kind of backdoor attacks, dubbed Semantically-Equivalent Transformation (SET)-based backdoor attacks, which use semantics-preserving low-prevalence code transformations to generate stealthy triggers. We propose a framework to guide the generation of such triggers. Our experiments across five tasks, six languages, and models like CodeBERT, CodeT5, and StarCoder show that SET-based attacks achieve high success rates (often >90%) while preserving model utility. The attack proves highly stealthy, evading state-of-the-art defenses with detection rates on average over 25.13% lower than injection-based counterparts. We evaluate normalization-based countermeasures and find they offer only partial mitigation, confirming the attack's robustness. These results motivate further investigation into scalable defenses tailored to SET-based attacks.

Problem

Research questions and friction points this paper is trying to address.

Introduces stealthy backdoor attacks using semantics-preserving code transformations

Evaluates attack effectiveness across multiple programming languages and models

Assesses limitations of current defenses and motivates new mitigation strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Semantically-Equivalent Transformation-based stealthy backdoor attacks

Proposes a framework to generate low-prevalence code triggers

Shows attacks evade defenses and require tailored mitigation

🔎 Similar Papers

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning