Learning to Generate Secure Code via Token-Level Rewards

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the tendency of large language models (LLMs) to introduce security vulnerabilities in code generation, a challenge exacerbated by the scarcity of high-quality secure code data and the limitations of coarse-grained reinforcement learning rewards. To overcome these issues, the authors propose the Vul2Safe framework, which first leverages an LLM-based self-reflection mechanism to construct high-confidence vulnerability-fix pairs and then employs an implicit prompting strategy to expand them into a high-quality dataset, PrimeVul+. Building on this, they introduce SRCode, a novel training framework that, for the first time, incorporates token-level rewards in code security reinforcement learning to enable fine-grained optimization of secure coding patterns. Experiments demonstrate that the proposed approach significantly reduces security vulnerabilities in generated code across multiple benchmarks while simultaneously improving overall code quality.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained reinforcement learning reward signals. To address these challenges, we propose Vul2Safe, a new secure code generation framework that leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities, and further generates diverse implicit prompts to build the PrimeVul+ dataset. Meanwhile, we introduce SRCode, a novel training framework that pioneers the use of token-level rewards in reinforcement learning for code security, which enables the model to continuously attend to and reinforce critical fine-grained security patterns during training. Compared with traditional instance-level reward schemes, our approach allows for more precise optimization of local security implementations. Extensive experiments show that PrimeVul+ and SRCode substantially reduce security vulnerabilities in generated code while improving overall code quality across multiple benchmarks.
Problem

Research questions and friction points this paper is trying to address.

secure code generation
security vulnerabilities
large language models
reinforcement learning rewards
code security
Innovation

Methods, ideas, or system contributions that make the work stand out.

token-level rewards
secure code generation
LLM self-reflection
PrimeVul+
SRCode
🔎 Similar Papers
No similar papers found.