COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis

📅 2024-08-09

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address the weak semantic understanding and significantly inferior end-to-end performance of small-scale LLMs (e.g., 7B-parameter models) in code debugging, this work proposes: (1) DEBUGEVAL—the first multi-granularity debugging benchmark covering localization, identification, repair, and verification stages—systematically exposing the limitations of 7B models in deep semantic reasoning; (2) COAST—a multi-agent collaborative data synthesis framework integrating role-based prompting, program analysis, and execution-feedback-driven iterative generation—to autonomously produce high-quality, diverse debugging data; (3) supervised fine-tuning of a 7B model on COAST-generated data, achieving substantial gains over both human-annotated and GPT-4-synthesized data on DEBUGEVAL, with debugging performance approaching that of GPT-3.5 and markedly narrowing the gap with large models. This work establishes the first scalable, interpretable multi-agent paradigm for debugging data synthesis.

Technology Category

Application Category

📝 Abstract

Code debugging is a vital stage of software development, essential for ensuring the reliability and performance of Large Language Models (LLMs) in code generation task. Human debugging typically follows a multi-stage process, which includes Bug Localization, Bug Identification, Code Repair, and Code Recognition. However, existing code debugging benchmarks predominantly focus on the Code Repair stage, which offers only a limited perspective on evaluating the debugging capabilities of LLMs. In this paper, we introduce DEBUGEVAL, a comprehensive benchmark for evaluating the debugging abilities of LLMs by emulating the multi-stage human debugging process. Through evaluating on DEBUGEVAL, we observe that 7B-scale models consistently underperform compared to their larger counterparts, highlighting their limitations in comprehending code semantics. In this case, we propose the COmmunicative Agent-based data SynThesis (COAST) framework, which employs a multi-agent system to generate high-quality training data for supervised fine-tuning (SFT). Experimental results demonstrate that COAST-generated data outperform human-curated and GPT-4-generated data, enabling 7B-scale LLMs to achieve debugging performance comparable to GPT-3.5.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Code Debugging

Performance Enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

COAST Framework

DEBUGEVAL Test

Collaborative Data Generation

🔎 Similar Papers

No similar papers found.