LocAgent: Graph-Guided LLM Agents for Code Localization

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Code localization—precisely identifying code locations to modify based on natural language descriptions—is a critical challenge in software maintenance, hindered by the need for semantic alignment across hierarchical code structures and multi-level dependencies. This paper introduces the first graph-guided large language model (LLM) agent framework: it models the codebase as a heterogeneous directed graph integrating syntactic structure and semantic dependencies, enabling multi-hop reasoning and lightweight navigation. Leveraging this graph, we fine-tune Qwen-2.5-Coder-Instruct-32B to construct a collaborative search agent. Our approach achieves 92.7% accuracy on file-level localization and improves GitHub issue resolution (Pass@10) by 12%, matching the performance of state-of-the-art proprietary models while reducing computational cost by 86%.

Technology Category

Application Category

📝 Abstract

Code localization--identifying precisely where in a codebase changes need to be made--is a fundamental yet challenging task in software maintenance. Existing approaches struggle to efficiently navigate complex codebases when identifying relevant code sections. The challenge lies in bridging natural language problem descriptions with the appropriate code elements, often requiring reasoning across hierarchical structures and multiple dependencies. We introduce LocAgent, a framework that addresses code localization through graph-based representation. By parsing codebases into directed heterogeneous graphs, LocAgent creates a lightweight representation that captures code structures (files, classes, functions) and their dependencies (imports, invocations, inheritance), enabling LLM agents to effectively search and locate relevant entities through powerful multi-hop reasoning. Experimental results on real-world benchmarks demonstrate that our approach significantly enhances accuracy in code localization. Notably, our method with the fine-tuned Qwen-2.5-Coder-Instruct-32B model achieves comparable results to SOTA proprietary models at greatly reduced cost (approximately 86% reduction), reaching up to 92.7% accuracy on file-level localization while improving downstream GitHub issue resolution success rates by 12% for multiple attempts (Pass@10). Our code is available at https://github.com/gersteinlab/LocAgent.

Problem

Research questions and friction points this paper is trying to address.

Efficiently navigate complex codebases for code localization.

Bridge natural language descriptions with relevant code elements.

Improve accuracy and reduce costs in code localization tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based code representation for localization

Multi-hop reasoning with LLM agents

Cost-effective fine-tuned Qwen-2.5-Coder model

🔎 Similar Papers

AgentFL: Scaling LLM-based Fault Localization to Project-Level Context