Poison with Style: A Practical Poisoning Attack on Code Large Language Models

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the vulnerability of code large language models (CLLMs) to stealthy poisoning attacks by proposing Poison-with-Style (PwS), a novel method that leverages developers’ inherent coding styles as implicit backdoor triggers. Unlike conventional approaches, PwS requires no modification to user inputs; instead, it injects vulnerabilities during fine-tuning through style-based trigger construction, targeted data collection, and a two-stage fine-tuning strategy. This enables highly covert attacks while preserving the model’s normal functionality. Experimental results on Python code completion demonstrate that PwS achieves a 95% attack success rate for CWE-20 vulnerabilities, with less than a 5% drop in pass@1 performance on HumanEval and MBPP benchmarks. The approach significantly outperforms existing attacks and effectively evades state-of-the-art defense mechanisms.

📝 Abstract

Code Large Language Models (CLLMs) serve as the core of modern code agents, enabling developers to automate complex software development tasks. In this paper, we present Poison-with-Style (PwS), a practical and stealthy model poisoning attack targeting CLLMs. Unlike prior attacks that assume an active adversary capable of directly embedding explicit triggers (e.g., specific words) into developers' prompts during inference, PwS leverages developers' code styles as covert triggers implicitly embedded within their prompts. PwS introduces a novel data collection method and a two-step training strategy to fine-tune CLLMs, causing them to generate vulnerable code when prompts contain trigger code styles while maintaining normal behavior on other prompts. Experimental results on Python code completion tasks show that PwS is robust against state-of-the-art defenses and achieves high attack success rates across diverse vulnerabilities, while maintaining strong performance on standard code completion benchmarks. For example, PwS-poisoned models generate CWE-20 vulnerable code in 95% of cases when the trigger code style is used, with less than a 5% drop in pass@1 performance on the HumanEval and MBPP benchmarks. Our implementation and dataset are here: https://github.com/khangtran2020/pws.

Problem

Research questions and friction points this paper is trying to address.

model poisoning

code large language models

adversarial attack

code style

vulnerable code generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

model poisoning

code style

stealthy trigger