BLAgent: Agentic RAG for File-Level Bug Localization

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

File-level defect localization remains a critical bottleneck in software maintenance, and existing retrieval-augmented generation (RAG) approaches suffer from insufficient reasoning precision due to their reliance on static retrieval. This work proposes BLAgent, an agent-based RAG framework tailored for defect localization, which innovatively integrates path-augmented AST chunking, dual-perspective query transformation capturing both structural and behavioral semantics, and a two-stage reranking mechanism combining symbolic validation with evidence-based reasoning. Evaluated on SWE-bench Lite, BLAgent achieves Top-1 accuracy of 78% with open-source models and 86% with closed-source models, reduces inference cost by over 18×, and boosts the success rate of automated program repair by more than 20%.

📝 Abstract

Bug localization remains a key bottleneck in downstream software maintenance tasks, including root cause analysis, triage, and automated program repair (APR), despite recent advances in large language model (LLM)-based repair systems. File-level bug localization is especially critical in hierarchical pipelines, where errors can propagate to downstream stages such as statement-level localization or patch generation. While Retrieval-Augmented Generation (RAG) offers a promising direction for grounding LLMs in repository context, existing RAG pipelines rely on static retrieval and lack the reasoning needed to identify faulty code accurately. In this work, we present BLAgent, a novel agentic RAG framework for file-level bug localization that integrates three key ideas: (i) code structure-aware repository encoding with path-augmented AST-based chunking, (ii) dual-perspective query transformation capturing both structural and behavioral signals, and (iii) two-phase agentic reranking combining symbolic inspection with evidence-grounded reasoning. Unlike prior graph-based or multi-hop agentic approaches, BLAgent performs bounded reasoning over a compact candidate set, balancing accuracy and cost. On SWE-bench Lite, BLAgent attains over 78% Top-1 accuracy with open-source models and over 86% with a closed-source model, while being over 18x cheaper than the strongest baseline using the same model. When integrated into an APR framework, it improves end-to-end repair success by over 20%.

Problem

Research questions and friction points this paper is trying to address.

bug localization

file-level

Retrieval-Augmented Generation

software maintenance

LLM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic RAG

file-level bug localization

AST-based chunking