AgentFL: Scaling LLM-based Fault Localization to Project-Level Context

📅 2024-03-25
🏛️ arXiv.org
📈 Citations: 19
Influential: 2
📄 PDF
🤖 AI Summary
Existing LLM-based fault localization methods operate effectively at the method or class level but struggle to scale to entire software systems. Method: This paper proposes the first project-level defect localization framework leveraging multi-agent collaboration, inspired by human developers’ three-stage debugging process—code comprehension, navigational exploration, and defect verification—thereby pioneering explicit modeling of human debugging behavior. The framework integrates test execution trace analysis, documentation-guided search, and dynamic context management. Contribution/Results: Evaluated on Defects4J v1.2.0, it achieves a Top-1 localization accuracy of 39.7% (157/395), substantially outperforming prior LLM-based approaches. It localizes single defects in an average of 97 seconds at a cost of only $0.074 per defect. By mitigating LLMs’ long-context comprehension limitations, this work bridges the critical gap from localized to system-wide fault localization.

Technology Category

Application Category

📝 Abstract
Fault Localization (FL) is an essential step during the debugging process. With the strong capabilities of code comprehension, the recent Large Language Models (LLMs) have demonstrated promising performance in diagnosing bugs in the code. Nevertheless, due to LLMs' limited performance in handling long contexts, existing LLM-based fault localization remains on localizing bugs within a small code scope (i.e., a method or a class), which struggles to diagnose bugs for a large code scope (i.e., an entire software system). To address the limitation, this paper presents AgentFL, a multi-agent system based on ChatGPT for automated fault localization. By simulating the behavior of a human developer, AgentFL models the FL task as a three-step process, which involves comprehension, navigation, and confirmation. Within each step, AgentFL hires agents with diversified expertise, each of which utilizes different tools to handle specific tasks. Particularly, we adopt a series of auxiliary strategies such as Test Behavior Tracking, Document-Guided Search, and Multi-Round Dialogue to overcome the challenges in each step. The evaluation on the widely used Defects4J-V1.2.0 benchmark shows that AgentFL can localize 157 out of 395 bugs within Top-1, which outperforms the other LLM-based approaches and exhibits complementarity to the state-of-the-art learning-based techniques. Additionally, we confirm the indispensability of the components in AgentFL with the ablation study and demonstrate the usability of AgentFL through a user study. Finally, the cost analysis shows that AgentFL spends an average of only 0.074 dollars and 97 seconds for a single bug.
Problem

Research questions and friction points this paper is trying to address.

Scaling LLM-based fault localization
Handling large code scopes
Multi-agent system for diagnostics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system for fault localization
Test Behavior Tracking strategy
Document-Guided Search technique