🤖 AI Summary
This work addresses the tendency of large language models (LLMs) to introduce security vulnerabilities in code generation, a challenge exacerbated by the scarcity of high-quality secure code data and the limitations of coarse-grained reinforcement learning rewards. To overcome these issues, the authors propose the Vul2Safe framework, which first leverages an LLM-based self-reflection mechanism to construct high-confidence vulnerability-fix pairs and then employs an implicit prompting strategy to expand them into a high-quality dataset, PrimeVul+. Building on this, they introduce SRCode, a novel training framework that, for the first time, incorporates token-level rewards in code security reinforcement learning to enable fine-grained optimization of secure coding patterns. Experiments demonstrate that the proposed approach significantly reduces security vulnerabilities in generated code across multiple benchmarks while simultaneously improving overall code quality.
📝 Abstract
Large language models (LLMs) have demonstrated strong capabilities in code generation, yet they remain prone to producing security vulnerabilities. Existing approaches commonly suffer from two key limitations: the scarcity of high-quality security data and coarse-grained reinforcement learning reward signals. To address these challenges, we propose Vul2Safe, a new secure code generation framework that leverages LLM self-reflection to construct high-confidence repair pairs from real-world vulnerabilities, and further generates diverse implicit prompts to build the PrimeVul+ dataset. Meanwhile, we introduce SRCode, a novel training framework that pioneers the use of token-level rewards in reinforcement learning for code security, which enables the model to continuously attend to and reinforce critical fine-grained security patterns during training. Compared with traditional instance-level reward schemes, our approach allows for more precise optimization of local security implementations. Extensive experiments show that PrimeVul+ and SRCode substantially reduce security vulnerabilities in generated code while improving overall code quality across multiple benchmarks.