RelRepair: Enhancing Automated Program Repair by Retrieving Relevant Code

📅 2025-09-20

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Large language models (LLMs) struggle with accurate patch generation in automated program repair (APR) due to insufficient project-specific context—such as identifiers, code structure, and semantic relationships. To address this, we propose RelRepair, a method that enhances LLMs’ domain awareness by retrieving fine-grained, project-intrinsic code snippets via joint exploitation of function signatures and deep code analysis, then injecting retrieved results into prompts. RelRepair innovatively integrates semantic features (e.g., function names, comments) with structural features (e.g., abstract syntax trees, control-flow graphs) for precise relevance matching. Evaluated on Defects4J v1.2, RelRepair successfully repairs 101 bugs; on ManySStuBs4J, it achieves a 48.3% repair rate—outperforming state-of-the-art baselines by 17.1 percentage points. These results demonstrate significant improvements in both accuracy and practicality of project-aware APR.

Technology Category

Application Category

📝 Abstract

Automated Program Repair (APR) has emerged as a promising paradigm for reducing debugging time and improving the overall efficiency of software development. Recent advances in Large Language Models (LLMs) have demonstrated their potential for automated bug fixing and other software engineering tasks. Nevertheless, the general-purpose nature of LLM pre-training means these models often lack the capacity to perform project-specific repairs, which require understanding of domain-specific identifiers, code structures, and contextual relationships within a particular codebase. As a result, LLMs may struggle to generate correct patches when the repair depends on project-specific information. To address this limitation, we introduce RelRepair, a novel approach that retrieves relevant project-specific code to enhance automated program repair. RelRepair first identifies relevant function signatures by analyzing function names and code comments within the project. It then conducts deeper code analysis to retrieve code snippets relevant to the repair context. The retrieved relevant information is then incorporated into the LLM's input prompt, guiding the model to generate more accurate and informed patches. We evaluate RelRepair on two widely studied datasets, Defects4J V1.2 and ManySStuBs4J, and compare its performance against several state-of-the-art LLM-based APR approaches. RelRepair successfully repairs 101 bugs in Defects4J V1.2. Furthermore, RelRepair achieves a 17.1% improvement in the ManySStuBs4J dataset, increasing the overall fix rate to 48.3%. These results highlight the importance of providing relevant project-specific information to LLMs, shedding light on effective strategies for leveraging LLMs in APR tasks.

Problem

Research questions and friction points this paper is trying to address.

LLMs lack project-specific knowledge for accurate program repair

General-purpose models struggle with domain-specific code understanding

Automated bug fixing requires contextual project information

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieves relevant project-specific code snippets

Incorporates retrieved code into LLM input prompts

Analyzes function signatures and comments for context

🔎 Similar Papers

A Systematic Literature Review on Large Language Models for Automated Program Repair