CodeRL+: Improving Code Generation via Reinforcement with Execution Semantics Alignment

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) face a fundamental semantic gap between textual pattern training and execution-level semantic correctness in code generation. Existing execution-based reward methods—relying solely on binary pass/fail signals—fail to capture subtle logical errors, resulting in insufficient alignment between generated text and intended semantics. To address this, we propose variable-level execution trajectory modeling, which establishes fine-grained alignment between code text and its runtime state. This enables the construction of verifiable, dense semantic consistency rewards, directly inferred from policy rollouts without requiring external oracles, and compatible with diverse reinforcement learning algorithms. Experiments demonstrate consistent improvements: an average +4.6% gain in pass@1 across benchmarks; +15.5% and +4.4% absolute accuracy gains on code reasoning and test output generation tasks, respectively; and significantly enhanced generalization across model architectures and RL algorithms.

Technology Category

Application Category

📝 Abstract
While Large Language Models (LLMs) excel at code generation by learning from vast code corpora, a fundamental semantic gap remains between their training on textual patterns and the goal of functional correctness, which is governed by formal execution semantics. Reinforcement Learning with Verifiable Rewards (RLVR) approaches attempt to bridge this gap using outcome rewards from executing test cases. However, solely relying on binary pass/fail signals is inefficient for establishing a well-aligned connection between the textual representation of code and its execution semantics, especially for subtle logical errors within the code. In this paper, we propose CodeRL+, a novel approach that integrates execution semantics alignment into the RLVR training pipeline for code generation. CodeRL+ enables the model to infer variable-level execution trajectory, providing a direct learning signal of execution semantics. CodeRL+ can construct execution semantics alignment directly using existing on-policy rollouts and integrates seamlessly with various RL algorithms. Extensive experiments demonstrate that CodeRL+ outperforms post-training baselines (including RLVR and Distillation), achieving a 4.6% average relative improvement in pass@1. CodeRL+ generalizes effectively to other coding tasks, yielding 15.5% and 4.4% higher accuracy on code-reasoning and test-output-generation benchmarks, respectively. CodeRL+ shows strong applicability across diverse RL algorithms and LLMs. Furthermore, probe analyses provide compelling evidence that CodeRL+ strengthens the alignment between code's textual representations and its underlying execution semantics.
Problem

Research questions and friction points this paper is trying to address.

Bridging semantic gap between code text and execution semantics
Improving reinforcement learning for functional code generation
Addressing inefficiency of binary test signals in training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates execution semantics alignment into RL training
Enables variable-level execution trajectory inference for learning
Constructs semantics alignment using on-policy rollouts with RL
🔎 Similar Papers
No similar papers found.
X
Xue Jiang
School of Computer Science, Peking University
Yihong Dong
Yihong Dong
Peking University
Code GenerationLarge Language Models
Mengyang Liu
Mengyang Liu
City University of Hong Kong
Deep LearningComputer VisionAIGC
H
Hongyi Deng
School of Computer Science, Peking University
T
Tian Wang
School of Computer Science, Peking University
Yongding Tao
Yongding Tao
Peking University
LLMCode Intelligence
Rongyu Cao
Rongyu Cao
Chinese Academy of Sciences
data minining
B
Binhua Li
Tongyi Lab, Alibaba Group
Zhi Jin
Zhi Jin
Sun Yat-Sen University, Associate Professor
W
Wenpin Jiao
School of Computer Science, Peking University
F
Fei Huang
Tongyi Lab, Alibaba Group
Y
Yongbin Li
Tongyi Lab, Alibaba Group
Ge Li
Ge Li
Full Professor of Computer Science, Peking University
Program AnalysisProgram GenerationDeep Learning