🤖 AI Summary
Existing LLM-based approaches for software debugging typically address isolated steps and struggle with complex, real-world defects requiring holistic reasoning. Method: This paper proposes FixAgent, an end-to-end debugging framework featuring a novel three-layer multi-agent architecture grounded in developer cognitive modeling. Agents are specialized—not generalized—according to distinct debugging phases: fault localization, root-cause analysis, and patch generation. FixAgent integrates LLM-driven program analysis and patch synthesis with a dynamic task coordination mechanism, enabling adaptive repair without requiring access to the true root-cause code. Contribution/Results: Evaluated on Defects4J, FixAgent achieves repair success rates 1.25–2.56× higher than state-of-the-art methods. It also significantly improves repository-level defect repair performance, empirically validating the effectiveness of cognition-guided multi-agent collaboration for complex debugging tasks.
📝 Abstract
Software debugging is a time-consuming endeavor involving a series of steps, such as fault localization and patch generation, each requiring thorough analysis and a deep understanding of the underlying logic. While large language models (LLMs) demonstrate promising potential in coding tasks, their performance in debugging remains limited. Current LLM-based methods often focus on isolated steps and struggle with complex bugs. In this paper, we propose the first end-to-end framework, FixAgent, for unified debugging through multi-agent synergy. It mimics the entire cognitive processes of developers, with each agent specialized as a particular component of this process rather than mirroring the actions of an independent expert as in previous multi-agent systems. Agents are coordinated through a three-level design, following a cognitive model of debugging, allowing adaptive handling of bugs with varying complexities. Experiments on extensive benchmarks demonstrate that FixAgent significantly outperforms state-of-the-art repair methods, fixing 1.25$ imes$ to 2.56$ imes$ bugs on the repo-level benchmark, Defects4J. This performance is achieved without requiring ground-truth root-cause code statements, unlike the baselines. Our source code is available on https://github.com/AcceptePapier/UniDebugger.