ROCODE: Integrating Backtracking Mechanism and Program Analysis in Large Language Models for Code Generation

📅 2024-11-11
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autoregressive code generation by large language models (LLMs) suffers from irreversible error accumulation—once an erroneous token is generated, it propagates and cannot be corrected retroactively. Method: We propose a real-time error detection and intrinsic backtracking correction mechanism integrated into the generation process. Our approach couples program-analysis-driven incremental error detection—unifying static and dynamic analysis—with LLMs’ intrinsic backtracking capability, triggering immediate rollback upon error detection and constrained re-generation of correct code. We further design a model-agnostic interface to support deployment across diverse LLMs. Contribution/Results: Experiments demonstrate a 99.1% compilation success rate, up to 23.8% improvement in test pass rate, and 19.3% reduction in token consumption. The method delivers consistent, statistically significant gains across nine mainstream LLMs. By enabling post-hoc modification of previously generated tokens, our work breaks the fundamental autoregressive constraint of immutable history, establishing a novel paradigm for trustworthy, self-correcting code generation.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have achieved impressive performance in code generation recently, offering programmers revolutionary assistance in software development. However, due to the auto-regressive nature of LLMs, they are susceptible to error accumulation during code generation. Once an error is produced, LLMs can merely continue to generate the subsequent code conditioned on it, given their inability to adjust previous outputs. Existing LLM-based approaches typically consider post-revising after code generation, leading to the challenging resolution of accumulated errors and the significant wastage of resources. Ideally, LLMs should rollback and resolve the occurred error in time during code generation, rather than proceed on the basis of the error and wait for post-revising after generation. In this paper, we propose ROCODE, which integrates the backtracking mechanism and program analysis into LLMs for code generation. Specifically, we employ program analysis to perform incremental error detection during the generation process. When an error is detected, the backtracking mechanism is triggered to priming rollback strategies and constraint regeneration, thereby eliminating the error early and ensuring continued generation on the correct basis. Experiments on multiple code generation benchmarks show that ROCODE can significantly reduce the errors generated by LLMs, with a compilation pass rate of 99.1%. The test pass rate is improved by up to 23.8% compared to the best baseline approach. Compared to the post-revising baseline, the token cost is reduced by 19.3%. Moreover, our approach is model-agnostic and achieves consistent improvements across nine representative LLMs.
Problem

Research questions and friction points this paper is trying to address.

Reducing error accumulation in LLM-based code generation
Integrating real-time error detection and backtracking in code generation
Improving compilation and test pass rates while reducing resource waste
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates backtracking mechanism in LLMs
Uses program analysis for error detection
Reduces errors and token cost significantly
🔎 Similar Papers
No similar papers found.
X
Xue Jiang
Key Lab of High Confidence Software Technology, MoE (Peking University), Beijing, China
Yihong Dong
Yihong Dong
Peking University
Code GenerationLarge Language Models
Yongding Tao
Yongding Tao
Peking University
LLMCode Intelligence
H
Huanyu Liu
Key Lab of High Confidence Software Technology, MoE (Peking University), Beijing, China
Zhi Jin
Zhi Jin
Sun Yat-Sen University, Associate Professor
W
Wenpin Jiao
Key Lab of High Confidence Software Technology, MoE (Peking University), Beijing, China
Ge Li
Ge Li
Full Professor of Computer Science, Peking University
Program AnalysisProgram GenerationDeep Learning