Beyond Localization: Recoverable Headroom and Residual Frontier in Repository-Level RAG-APR

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Although repository-scale automated program repair (RAG-APR) has shown improved performance with enhanced fault localization, substantial room for recoverable gains and a residual frontier remain. This work proposes a unified evaluation protocol to systematically analyze key factors beyond localization that influence repair effectiveness, including candidate diversity, contextual evidence quality, and interface design. Through techniques such as oracle localization, Best-of-K sampling, fixed-interface probing, in-repo hard negatives, and universal wrapper validation, experiments reveal that even with perfect localization, the success rates of three major APR systems remain below 50%; gains from candidate diversity saturate rapidly; high-quality context substantially boosts repair performance; KGCompass and ExpeRepair excel under a universal wrapper; and the optimal probe yields only six additional correct repairs, highlighting fundamental bottlenecks in current approaches.
📝 Abstract
Repository-level automated program repair (APR) increasingly treats stronger localization as the main path to better repair. We ask a more targeted question: once localization is strengthened, which post-localization levers still provide recoverable gains, which are bounded within our protocol, and what residual frontier remains? We study this question on SWE-bench Lite with three representative repository-level RAG-APR paradigms, Agentless, KGCompass, and ExpeRepair. Our protocol combines Oracle Localization, within-pool Best-of-K, fixed-interface added context probes with per-condition same-token filler controls and same-repository hard negatives, and a common-wrapper oracle check. Oracle Localization improves all three systems, but Oracle success still stays below 50%. Extra candidate diversity still helps inside the sampled 10-patch pools, but that headroom saturates quickly. Under the two fixed interfaces, most informative added context conditions still outperform their own matched controls. The common-wrapper check shows different system responses: under a common wrapper, gains remain large for KGCompass and ExpeRepair, while Agentless changes more with builder choice. Prompt-level fusion still leaves a large residual frontier: the best fixed probe adds only 6 solved instances beyond the native three-system Solved@10 union. Overall, stronger localization, bounded search, evidence quality, and interface design all shape repository-level repair outcomes.
Problem

Research questions and friction points this paper is trying to address.

Automated Program Repair
Repository-Level RAG
Localization
Residual Frontier
Repair Headroom
Innovation

Methods, ideas, or system contributions that make the work stand out.

repository-level APR
recoverable headroom
residual frontier
RAG-APR
oracle localization
🔎 Similar Papers
No similar papers found.