COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis

📅 2024-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the weak semantic understanding and significantly inferior end-to-end performance of small-scale LLMs (e.g., 7B-parameter models) in code debugging, this work proposes: (1) DEBUGEVAL—the first multi-granularity debugging benchmark covering localization, identification, repair, and verification stages—systematically exposing the limitations of 7B models in deep semantic reasoning; (2) COAST—a multi-agent collaborative data synthesis framework integrating role-based prompting, program analysis, and execution-feedback-driven iterative generation—to autonomously produce high-quality, diverse debugging data; (3) supervised fine-tuning of a 7B model on COAST-generated data, achieving substantial gains over both human-annotated and GPT-4-synthesized data on DEBUGEVAL, with debugging performance approaching that of GPT-3.5 and markedly narrowing the gap with large models. This work establishes the first scalable, interpretable multi-agent paradigm for debugging data synthesis.

Technology Category

Application Category

📝 Abstract
Code debugging is a vital stage of software development, essential for ensuring the reliability and performance of Large Language Models (LLMs) in code generation task. Human debugging typically follows a multi-stage process, which includes Bug Localization, Bug Identification, Code Repair, and Code Recognition. However, existing code debugging benchmarks predominantly focus on the Code Repair stage, which offers only a limited perspective on evaluating the debugging capabilities of LLMs. In this paper, we introduce DEBUGEVAL, a comprehensive benchmark for evaluating the debugging abilities of LLMs by emulating the multi-stage human debugging process. Through evaluating on DEBUGEVAL, we observe that 7B-scale models consistently underperform compared to their larger counterparts, highlighting their limitations in comprehending code semantics. In this case, we propose the COmmunicative Agent-based data SynThesis (COAST) framework, which employs a multi-agent system to generate high-quality training data for supervised fine-tuning (SFT). Experimental results demonstrate that COAST-generated data outperform human-curated and GPT-4-generated data, enabling 7B-scale LLMs to achieve debugging performance comparable to GPT-3.5.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Code Debugging
Performance Enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

COAST Framework
DEBUGEVAL Test
Collaborative Data Generation
🔎 Similar Papers
No similar papers found.
Weiqing Yang
Weiqing Yang
Department of Computer Science and Technology, Northeastern University, China
Hanbin Wang
Hanbin Wang
Peking University
Natural Language ProcessingCode IntelligenceInformation Retrieval
Zhenghao Liu
Zhenghao Liu
Northeastern University
NLPInformation Retrieval
X
Xinze Li
Department of Computer Science and Technology, Northeastern University, China
Yukun Yan
Yukun Yan
Tsinghua University
Large Language Model
S
Shuo Wang
Department of Computer Science and Technology, Institute for AI, Tsinghua University, China
Y
Yu Gu
Department of Computer Science and Technology, Northeastern University, China
M
Minghe Yu
Software College, Northeastern University, China
Z
Zhiyuan Liu
Department of Computer Science and Technology, Institute for AI, Tsinghua University, China
G
Ge Yu
Department of Computer Science and Technology, Northeastern University, China