Learning to Generate Secure Code via Token-Level Rewards

📅 2026-02-26

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the tendency of large language models (LLMs) to introduce security vulnerabilities in code generation, a challenge exacerbated by the scarcity of high-quality secure code data and the limitations of coarse-grained reinforcement learning rewards. To overcome these issues, the authors propose the Vul2Safe framework, which first leverages an LLM-based self-reflection mechanism to construct high-confidence vulnerability-fix pairs and then employs an implicit prompting strategy to expand them into a high-quality dataset, PrimeVul+. Building on this, they introduce SRCode, a novel training framework that, for the first time, incorporates token-level rewards in code security reinforcement learning to enable fine-grained optimization of secure coding patterns. Experiments demonstrate that the proposed approach significantly reduces security vulnerabilities in generated code across multiple benchmarks while simultaneously improving overall code quality.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained reinforcement learning reward signals. To address these challenges, we propose Vul2Safe, a new secure code generation framework that leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities, and further generates diverse implicit prompts to build the PrimeVul+ dataset. Meanwhile, we introduce SRCode, a novel training framework that pioneers the use of token-level rewards in reinforcement learning for code security, which enables the model to continuously attend to and reinforce critical fine-grained security patterns during training. Compared with traditional instance-level reward schemes, our approach allows for more precise optimization of local security implementations. Extensive experiments show that PrimeVul+ and SRCode substantially reduce security vulnerabilities in generated code while improving overall code quality across multiple benchmarks.

Problem

Research questions and friction points this paper is trying to address.

secure code generation

security vulnerabilities

large language models

reinforcement learning rewards

code security

Innovation

Methods, ideas, or system contributions that make the work stand out.

token-level rewards

secure code generation

LLM self-reflection