🤖 AI Summary
This work addresses the limitations of small language models in complex logical code generation, where constrained reasoning capabilities often lead to error loops and a trade-off between efficiency and accuracy. To overcome this, we propose DebateCoder, a multi-agent collaborative framework that orchestrates three specialized agents—User, Technical, and Quality Assurance—through an adaptive confidence-gated mechanism (95% threshold), orthogonal pre-generation debate, and a post-generation debugging loop. This synergistic approach enables efficient and reliable cooperative reasoning. Evaluated on HumanEval, DebateCoder achieves a 70.12% Pass@1 score, outperforming MapCoder while reducing API costs by approximately 35%, thereby significantly enhancing code generation performance under resource-constrained conditions.
📝 Abstract
While Large Language Models (LLMs) have catalyzed breakthroughs in automated code generation, Small Language Models (SLMs) often encounter reasoning bottlenecks and failure loops when addressing complex logical requirements. To overcome these challenges, we propose DebateCoder, a multi-agent collaborative framework designed to improve the reasoning ability of SLMs (e.g., Pangu-1B) in resource-constrained environments. DebateCoder uses a structured role-playing protocol with three agents: User Agent (A_UA), Technical Agent (A_TA), and Quality Assurance Agent (A_QA). It also includes an Adaptive Confidence Gating mechanism with a 95% threshold to balance accuracy and inference efficiency. In addition, we introduce a multi-turn deliberation module and a reviewer-guided analytical debugging loop for orthogonal pre-generation debate and post-generation refinement. Experiments on HumanEval and MBPP show that DebateCoder achieves 70.12% Pass@1 on HumanEval, outperforming MapCoder while reducing API overhead by about 35%. These results indicate that collaborative protocols can mitigate limitations of small-parameter models and provide a scalable, efficient approach to high-quality automated software engineering.