Planning-Driven Programming: A Large Language Model Programming Workflow

📅 2024-11-21

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

To address low code accuracy, poor debugging efficiency, and inconsistent reasoning in large language model (LLM)-based code generation, this paper proposes a plan-driven two-stage programming paradigm. In the first stage, the model generates a natural-language problem-solving plan and explicitly verifies its correctness; in the second stage, it synthesizes executable code conditioned on the verified plan and iteratively refines the output using test feedback. This approach uniquely establishes plan verification as a unified grounding mechanism for both code generation and repair, significantly enhancing interpretability and debuggability. Implemented with GPT-4o, our method achieves new state-of-the-art Pass@1 scores across five major benchmarks—including HumanEval (98.2%) and MBPP (84.8%)—with maximum improvements of 16.4 percentage points, demonstrating that structured planning substantially augments LLMs’ code generation capabilities.

Technology Category

Application Category

📝 Abstract

The strong performance of large language models (LLMs) raises extensive discussion on their application to code generation. Recent research suggests continuous program refinements through visible tests to improve code generation accuracy in LLMs. However, these methods suffer from LLMs' inefficiency and limited reasoning capacity. In this work, we propose an LLM programming workflow (LPW) designed to improve both initial code generation and subsequent refinements within a structured two-phase workflow. Specifically, the solution generation phase formulates a solution plan, which is then verified through visible tests to specify the intended natural language solution. Subsequently, the code implementation phase drafts an initial code according to the solution plan and its verification. If the generated code fails the visible tests, the plan verification serves as the intended solution to consistently inform the refinement process for correcting bugs. Compared to state-of-the-art methods across various existing LLMs, LPW significantly improves the Pass@1 accuracy by up to 16.4% on well-established text-to-code generation benchmarks. LPW also sets new state-of-the-art Pass@1 accuracy, achieving 98.2% on HumanEval, 84.8% on MBPP, 59.3% on LiveCode, 62.6% on APPS, and 34.7% on CodeContest, using GPT-4o as the backbone.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Code Generation

Accuracy and Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM Programming Workflow

Code Accuracy Improvement

GPT-4o Benchmark

🔎 Similar Papers

Self-Planning Code Generation with Large Language Models