PatchRecall: Patch-Driven Retrieval for Automated Program Repair

📅 2026-04-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical challenge in automated program repair of accurately and efficiently retrieving files relevant to a given bug. The authors propose a hybrid retrieval approach that integrates semantic search based on the problem description with file matching derived from historically similar issues. By employing a dual-path recall and reranking mechanism, the method significantly improves recall of relevant files while maintaining a controlled retrieval scope. Innovatively combining the semantics of the current issue with historical repair experience, the approach demonstrates strong effectiveness on the SWE-Bench benchmark, yielding a high-quality, low-noise candidate file set that effectively supports downstream repair processes.

Technology Category

Application Category

📝 Abstract
Retrieving the correct set of files from a large codebase is a crucial step in Automated Program Repair (APR). High recall is necessary to ensure that the relevant files are included, but simply increasing the number of retrieved files introduces noise and degrades efficiency. To address this tradeoff, we propose PatchRecall, a hybrid retrieval approach that balances recall with conciseness. Our method combines two complementary strategies: (1) codebase retrieval, where the current issue description is matched against the codebase to surface potentially relevant files, and (2) history-based retrieval, where similar past issues are leveraged to identify edited files as candidate targets. Candidate files from both strategies are merged and reranked to produce the final retrieval set. Experiments on SWE-Bench demonstrate that PatchRecall achieves higher recall without significantly increasing retrieved file count, enabling more effective APR.
Problem

Research questions and friction points this paper is trying to address.

Automated Program Repair
code retrieval
recall
codebase
file retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

PatchRecall
Automated Program Repair
hybrid retrieval
codebase retrieval
history-based retrieval
🔎 Similar Papers
No similar papers found.
Mahir Labib Dihan
Mahir Labib Dihan
CSE, BUET
Natural Language ProcessingLarge Language ModelsGeo Spatial
F
Faria Binta Awal
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology (BUET)
M
Md. Ishrak Ahsan
Department of Computer Science and Engineering, Bangladesh University of Engineering and Technology (BUET)