🤖 AI Summary
Large language models (LLMs) face persistent bottlenecks in complex programming tasks—including inefficient iterative debugging, weak error handling, and poor adaptability to problem structure—while existing fine-tuning or self-repair approaches struggle to balance efficiency with knowledge reuse. To address these challenges, we propose a multi-agent collaborative framework. Its core contributions are: (1) a Fixing Knowledge Set mechanism that enables cross-task identification of error patterns and continuous accumulation and retrieval-augmented utilization of repair knowledge; and (2) a centralized, guidance-oriented Mentor Agent that dynamically optimizes repair strategies and orchestrates agent-level collaborative self-repair. Integrating multi-agent architecture, LLM-based reasoning, and iterative repair techniques, our framework achieves substantial improvements over zero-shot prompting and state-of-the-art self-repair methods on MBPP, HumanEval, and LiveCodeBench—yielding Pass@10 gains of 3.1–12.1% and Pass@50 gains of 1.4–14.5%.
📝 Abstract
With the widespread adoption of Large Language Models (LLMs) such as GitHub Copilot and ChatGPT, developers increasingly rely on AI-assisted tools to support code generation. While LLMs can generate syntactically correct solutions for well-structured programming tasks, they often struggle with challenges that require iterative debugging, error handling, or adaptation to diverse problem structures. Existing approaches such as fine-tuning or self-repair strategies either require costly retraining or lack mechanisms to accumulate and reuse knowledge from previous attempts.
To address these limitations, we propose MemoCoder, a multi-agent framework that enables collaborative problem solving and persistent learning from past fixes. At the core of MemoCoder is a Fixing Knowledge Set, which stores successful repairs and supports retrieval for future tasks. A central Mentor Agent supervises the repair process by identifying recurring error patterns and refining high-level fixing strategies, providing a novel supervisory role that guides the self-repair loop. We evaluate MemoCoder across three public benchmarks -- MBPP, HumanEval, and LiveCodeBench -- spanning a range of problem complexities. Experimental results show that MemoCoder consistently outperforms both zero-shot prompting and a Self-Repair strategy, with improvements ranging from 3.1% to 12.1% in Pass@10 and from 1.4% to 14.5% in Pass@50, demonstrating its effectiveness in iterative refinement and knowledge-guided code generation.