Code Review Automation using Retrieval Augmented Generation

📅 2025-11-07

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing automated code review methods often produce off-topic or overly generic comments. To address this, we propose RARe, the first framework to integrate Retrieval-Augmented Generation (RAG) into code review: a dense retriever precisely identifies semantically relevant historical review cases from a repository, and a large language model leverages in-context learning to generate high-quality, issue-focused feedback. RARe synergistically combines retrieval accuracy with generative flexibility, significantly improving comment relevance and explainability. On two benchmark datasets, RARe achieves BLEU-4 scores of 12.32 and 12.96—outperforming current state-of-the-art methods. Comprehensive human evaluation and interpretability analysis further confirm its effectiveness and practical utility.

Technology Category

Application Category

📝 Abstract

Code review is essential for maintaining software quality but is labor-intensive. Automated code review generation offers a promising solution to this challenge. Both deep learning-based generative techniques and retrieval-based methods have demonstrated strong performance in this task. However, despite these advancements, there are still some limitations where generated reviews can be either off-point or overly general. To address these issues, we introduce Retrieval-Augmented Reviewer (RARe), which leverages Retrieval-Augmented Generation (RAG) to combine retrieval-based and generative methods, explicitly incorporating external domain knowledge into the code review process. RARe uses a dense retriever to select the most relevant reviews from the codebase, which then enrich the input for a neural generator, utilizing the contextual learning capacity of large language models (LLMs), to produce the final review. RARe outperforms state-of-the-art methods on two benchmark datasets, achieving BLEU-4 scores of 12.32 and 12.96, respectively. Its effectiveness is further validated through a detailed human evaluation and a case study using an interpretability tool, demonstrating its practical utility and reliability.

Problem

Research questions and friction points this paper is trying to address.

Automating labor-intensive code review process to maintain software quality

Addressing limitations of overly general or off-point automated code reviews

Integrating external domain knowledge into automated code review generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines retrieval-based and generative methods

Uses dense retriever to select relevant reviews

Enriches input for neural generator using LLMs

🔎 Similar Papers

No similar papers found.