Towards Better Correctness and Efficiency in Code Generation

📅 2025-08-24

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Large language models (LLMs) frequently generate code with suboptimal runtime efficiency, limiting their deployment in performance-critical applications. To address this, we propose an efficiency-aware reinforcement learning framework that jointly optimizes code correctness and execution efficiency via a dynamic exploration mechanism, error-insensitive reward modeling, and a two-stage fine-tuning strategy. Our method integrates offline pretraining, online fine-tuning, and performance-driven fine-grained reward modeling—thereby overcoming reliance on static datasets. Experiments on a 7B-parameter LLM demonstrate a 10.18% absolute improvement in functional correctness and a 7.75% reduction in average execution time, achieving performance competitive with significantly larger models. To our knowledge, this is the first work to empirically verify simultaneous, measurable gains in both correctness and efficiency for LLM-generated code.

Technology Category

Application Category

📝 Abstract

While code large language models have demonstrated remarkable progress in code generation, the generated code often exhibits poor runtime efficiency, limiting its practical application in performance-sensitive scenarios. To address this limitation, we propose an efficiency-oriented reinforcement learning framework guided by a novel performance reward. Based on this framework, we take a deeper dive into the code efficiency problem, identifying then proposing methods to overcome key bottlenecks: (1) Dynamic exploration overcomes the static data constraints of offline fine-tuning, enabling the discovery of more efficient code implementations. (2) The error-insensitive reinforcement learning method and high-contrast efficiency signals are crucial for mitigating systematic errors and achieving effective optimization. (3) Online exploration is most effective when starting from a high-correctness baseline, as this allows for efficiency improvements without sacrificing accuracy. With these discoveries, we finally propose a two-stage tuning method, which achieves high and balanced performance across correctness and efficiency. The results of experiments show the effectiveness of the method, which improves code correctness by 10.18% and runtime efficiency by 7.75% on a 7B model, achieving performance comparable to much larger model.

Problem

Research questions and friction points this paper is trying to address.

Improving runtime efficiency of code generated by large language models

Overcoming static data constraints through dynamic exploration methods

Balancing code correctness and efficiency via two-stage tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficiency-oriented reinforcement learning framework

Dynamic exploration overcomes static data constraints

Two-stage tuning balances correctness and efficiency

🔎 Similar Papers

Code-Optimise: Self-Generated Preference Data for Correctness and Efficiency