Poison with Style: A Practical Poisoning Attack on Code Large Language Models

๐Ÿ“… 2026-05-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the vulnerability of code large language models (CLLMs) to stealthy poisoning attacks by proposing Poison-with-Style (PwS), a novel method that leverages developersโ€™ inherent coding styles as implicit backdoor triggers. Unlike conventional approaches, PwS requires no modification to user inputs; instead, it injects vulnerabilities during fine-tuning through style-based trigger construction, targeted data collection, and a two-stage fine-tuning strategy. This enables highly covert attacks while preserving the modelโ€™s normal functionality. Experimental results on Python code completion demonstrate that PwS achieves a 95% attack success rate for CWE-20 vulnerabilities, with less than a 5% drop in pass@1 performance on HumanEval and MBPP benchmarks. The approach significantly outperforms existing attacks and effectively evades state-of-the-art defense mechanisms.
๐Ÿ“ Abstract
Code Large Language Models (CLLMs) serve as the core of modern code agents, enabling developers to automate complex software development tasks. In this paper, we present Poison-with-Style (PwS), a practical and stealthy model poisoning attack targeting CLLMs. Unlike prior attacks that assume an active adversary capable of directly embedding explicit triggers (e.g., specific words) into developers' prompts during inference, PwS leverages developers' code styles as covert triggers implicitly embedded within their prompts. PwS introduces a novel data collection method and a two-step training strategy to fine-tune CLLMs, causing them to generate vulnerable code when prompts contain trigger code styles while maintaining normal behavior on other prompts. Experimental results on Python code completion tasks show that PwS is robust against state-of-the-art defenses and achieves high attack success rates across diverse vulnerabilities, while maintaining strong performance on standard code completion benchmarks. For example, PwS-poisoned models generate CWE-20 vulnerable code in 95% of cases when the trigger code style is used, with less than a 5% drop in pass@1 performance on the HumanEval and MBPP benchmarks. Our implementation and dataset are here: https://github.com/khangtran2020/pws.
Problem

Research questions and friction points this paper is trying to address.

model poisoning
code large language models
adversarial attack
code style
vulnerable code generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

model poisoning
code style
stealthy trigger
code large language models
vulnerable code generation
๐Ÿ”Ž Similar Papers
No similar papers found.