CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging

📅 2025-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing code generation methods rely on external tools (e.g., compilers) for iterative debugging, making performance heavily dependent on the quality of initially generated code. Method: This paper proposes a multi-agent, simulation-driven framework that replaces traditional external debugging with an end-to-end, input/output-based, semantic-level stepwise program simulation mechanism—unifying algorithmic planning, code generation, and internal debugging under a human-like perception paradigm. Contribution/Results: Key innovations include: (1) the first use of visualizable program simulation for end-to-end planning verification; (2) a semantic-level simulation engine, phased task decomposition, and reflective reasoning; and (3) seamless cascading with external debuggers. The framework achieves new state-of-the-art pass@1 scores on HumanEval (95.1%), MBPP (90.7%), APPS (22.0%), and CodeContests (29.1%), significantly reducing reliance on initial code quality.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have made significant strides in code generation and problem solving. Current approaches employ external tool-based iterative debuggers that use compiler or other tool-based runtime feedback to refine coarse programs generated by various methods. However, the effectiveness of these approaches heavily relies on the quality of the initial code generation, which remains an open challenge. In this paper, we introduce CodeSim, a novel multi-agent code generation framework that comprehensively addresses the stages of program synthesis-planning, coding, and debugging-through a human-like perception approach. As human verifies their understanding of any algorithms through visual simulation, CodeSim uniquely features a method of plan verification and internal debugging through the step-by-step simulation of input/output. Extensive experiments across seven challenging competitive problem-solving and program synthesis benchmarks demonstrate CodeSim's remarkable code generation capabilities. Our framework achieves new state-of-the-art (pass@1) results-(HumanEval 95.1%, MBPP 90.7%, APPS 22%, and CodeContests 29.1%). Furthermore, our method shows potential for even greater enhancement when cascaded with external debuggers. To facilitate further research and development in this area, we have open-sourced our framework in this link (https://kagnlp.github.io/codesim.github.io/).
Problem

Research questions and friction points this paper is trying to address.

Multi-agent code generation framework
Simulation-driven planning and debugging
Improves code generation effectiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent code generation framework
Simulation-driven planning and debugging
Step-by-step input/output verification
🔎 Similar Papers
No similar papers found.
M
Md. Ashraful Islam
Bangladesh University of Engineering and Technology (BUET)
Mohammed Eunus Ali
Mohammed Eunus Ali
Monash University
Spatial DatabasesUrban ComputingArtificial Intelligence
M
Md. Rizwan Parvez
Qatar Computing Research Institute (QCRI)