REFINE: Enhancing Program Repair Agents through Context-Aware Patch Refinement

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Existing LLM-based automated program repair (APR) approaches suffer from insufficient code context understanding and incomplete test-suite coverage, often yielding partially correct or overfitted patches. To address this, we propose Context-Aware Patch Refinement (CAPR), a novel framework featuring three core innovations: (1) fuzzy problem disambiguation to improve defect localization accuracy; (2) test-time augmentation for generating diverse candidate patches; and (3) an LLM-driven multi-agent code review mechanism that aggregates partially correct patches and refines them into complete, correct fixes. CAPR is modular and integrable, compatible with mainstream APR systems. Evaluated on SWE-Bench Lite, CAPR achieves a 51.67% resolution rate—outperforming AutoCodeRover by 14.67% and surpassing state-of-the-art methods by an average of 14% across multiple benchmarks.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have recently shown strong potential in automatic program repair (APR), especially in repository-level settings where the goal is to generate patches based on natural language issue descriptions, large codebases, and regression tests. However, despite their promise, current LLM-based APR techniques often struggle to produce correct fixes due to limited understanding of code context and over-reliance on incomplete test suites. As a result, they frequently generate Draft Patches-partially correct patches that either incompletely address the bug or overfit to the test cases. In this work, we propose a novel patch refinement framework, Refine, that systematically transforms Draft Patches into correct ones. Refine addresses three key challenges: disambiguating vague issue and code context, diversifying patch candidates through test-time scaling, and aggregating partial fixes via an LLM-powered code review process. We implement Refine as a general refinement module that can be integrated into both open-agent-based and workflow-based APR systems. Our evaluation on the SWE-Bench Lite benchmark shows that Refine achieves state-of-the-art results among workflow-based approaches and approaches the best-known performance across all APR categories. Specifically, Refine boosts AutoCodeRover's performance by 14.67%, achieving a score of 51.67% and surpassing all prior baselines. On SWE-Bench Verified, Refine improves the resolution rate by 12.2%, and when integrated across multiple APR systems, it yields an average improvement of 14%-demonstrating its broad effectiveness and generalizability. These results highlight the effectiveness of refinement as a missing component in current APR pipelines and the potential of agentic collaboration in closing the gap between near-correct and correct patches. We also open source our code.

Problem

Research questions and friction points this paper is trying to address.

Improving automatic program repair by refining partially correct patches

Addressing limited code context understanding in LLM-based repair systems

Transforming draft patches into correct fixes through systematic refinement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Refines draft patches via context-aware LLM framework

Diversifies patch candidates through test-time scaling

Aggregates partial fixes via LLM-powered code review

🔎 Similar Papers

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair