🤖 AI Summary
Software fault localization—precisely identifying fault-relevant code in large codebases—faces challenges of insufficient LLM reasoning accuracy and excessive contextual redundancy. This paper proposes a collaborative LLM-agent-based localization framework, introducing three novel technical contributions: (1) a priority-scheduled, action-guided mechanism; (2) fine-grained action decomposition coupled with semantic relevance scoring; and (3) distance-aware dynamic context pruning. The method integrates code semantic distance modeling, structured action space design, and adaptive context compression. Evaluated on the SWE-bench Lite benchmark, it achieves a 65.33% function-level match rate—setting a new open-source state-of-the-art. When integrated with patch generation, the end-to-end problem-solving rate improves by 6.33 percentage points, demonstrating both effectiveness and practical utility.
📝 Abstract
Recent developments in Large Language Model (LLM) agents are revolutionizing Autonomous Software Engineering (ASE), enabling automated coding, problem fixes, and feature improvements. However, localization -- precisely identifying software problems by navigating to relevant code sections -- remains a significant challenge. Current approaches often yield suboptimal results due to a lack of effective integration between LLM agents and precise code search mechanisms. This paper introduces OrcaLoca, an LLM agent framework that improves accuracy for software issue localization by integrating priority-based scheduling for LLM-guided action, action decomposition with relevance scoring, and distance-aware context pruning. Experimental results demonstrate that OrcaLoca becomes the new open-source state-of-the-art (SOTA) in function match rate (65.33%) on SWE-bench Lite. It also improves the final resolved rate of an open-source framework by 6.33 percentage points through its patch generation integration.