SecCodePRM: A Process Reward Model for Code Security

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing code security detection methods struggle to provide real-time, fine-grained feedback in interactive or streaming code generation scenarios and exhibit significant performance degradation on long code sequences. To address these limitations, this work proposes SecCodePRM—a process reward model tailored for code security—that, for the first time, enables prefix-level, context-aware stepwise security scoring. By integrating static analysis, expert annotations, and a risk-sensitive aggregation mechanism, SecCodePRM supports vulnerability detection and secure code generation for both partial and complete code through reward-based ranking during inference. Experimental results demonstrate that SecCodePRM consistently outperforms existing approaches across three tasks, enhancing security without compromising functional correctness and overcoming the traditional reliance on full-context inputs and sparse feedback signals.

Technology Category

Application Category

📝 Abstract
Large Language Models are rapidly becoming core components of modern software development workflows, yet ensuring code security remains challenging. Existing vulnerability detection pipelines either rely on static analyzers or use LLM/GNN-based detectors trained with coarse program-level supervision. Both families often require complete context, provide sparse end-of-completion feedback, and can degrade as code length grows, making them ill-suited for real-time, prefix-level assessment during interactive coding and streaming generation. We propose SecCodePRM, a security-oriented process reward model that assigns a context-aware, step-level security score along a code trajectory. To train the model, we derive step-level supervision labels from static analyzers and expert annotations, allowing the model to attend more precisely to fine-grained regions associated with inter-procedural vulnerabilities. SecCodePRM has three applications: full-code vulnerability detection (VD), partial-code VD, and secure code generation (CG). For VD, SecCodePRM uses risk-sensitive aggregation that emphasizes high-risk steps; for CG, SecCodePRM supports inference-time scaling by ranking candidate continuations and favoring higher cumulative reward. This design yields dense, real-time feedback that scales to long-horizon generation. Empirically, SecCodePRM outperforms prior approaches in all three settings, while preserving code functional correctness, suggesting improved security without a safety-utility tradeoff.
Problem

Research questions and friction points this paper is trying to address.

code security
vulnerability detection
process reward model
real-time feedback
prefix-level assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Process Reward Model
Step-level Security Scoring
Secure Code Generation
Vulnerability Detection
Real-time Feedback
🔎 Similar Papers
No similar papers found.