Code Review Automation using Retrieval Augmented Generation

๐Ÿ“… 2025-11-07
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing automated code review methods often produce off-topic or overly generic comments. To address this, we propose RARe, the first framework to integrate Retrieval-Augmented Generation (RAG) into code review: a dense retriever precisely identifies semantically relevant historical review cases from a repository, and a large language model leverages in-context learning to generate high-quality, issue-focused feedback. RARe synergistically combines retrieval accuracy with generative flexibility, significantly improving comment relevance and explainability. On two benchmark datasets, RARe achieves BLEU-4 scores of 12.32 and 12.96โ€”outperforming current state-of-the-art methods. Comprehensive human evaluation and interpretability analysis further confirm its effectiveness and practical utility.

Technology Category

Application Category

๐Ÿ“ Abstract
Code review is essential for maintaining software quality but is labor-intensive. Automated code review generation offers a promising solution to this challenge. Both deep learning-based generative techniques and retrieval-based methods have demonstrated strong performance in this task. However, despite these advancements, there are still some limitations where generated reviews can be either off-point or overly general. To address these issues, we introduce Retrieval-Augmented Reviewer (RARe), which leverages Retrieval-Augmented Generation (RAG) to combine retrieval-based and generative methods, explicitly incorporating external domain knowledge into the code review process. RARe uses a dense retriever to select the most relevant reviews from the codebase, which then enrich the input for a neural generator, utilizing the contextual learning capacity of large language models (LLMs), to produce the final review. RARe outperforms state-of-the-art methods on two benchmark datasets, achieving BLEU-4 scores of 12.32 and 12.96, respectively. Its effectiveness is further validated through a detailed human evaluation and a case study using an interpretability tool, demonstrating its practical utility and reliability.
Problem

Research questions and friction points this paper is trying to address.

Automating labor-intensive code review process to maintain software quality
Addressing limitations of overly general or off-point automated code reviews
Integrating external domain knowledge into automated code review generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines retrieval-based and generative methods
Uses dense retriever to select relevant reviews
Enriches input for neural generator using LLMs
๐Ÿ”Ž Similar Papers
No similar papers found.