🤖 AI Summary
To address the weak contextual understanding of rule-based systems and the trade-off between coverage and precision in learning-based models for code review, this paper proposes a knowledge- and data-driven collaborative three-stage framework. In the data augmentation stage, static analysis results are integrated to enhance training data quality; in the inference stage, retrieval-augmented generation (RAG) is employed to enable context-aware review generation; and in the post-processing stage, a novel output concatenation operator (NCO) fuses multi-source suggestions. This work is the first to systematically integrate static analyzers (knowledge-driven) and large language models (learning-driven) across the entire pipeline—data preparation, inference, and post-processing. Evaluated on a real-world code review dataset, the method significantly outperforms both pure rule-based tools and fine-tuned LLMs, achieving substantial improvements in relevance, completeness, explainability, problem coverage, and precision of review comments.
📝 Abstract
Code review is a crucial but often complex, subjective, and time-consuming activity in software development. Over the past decades, significant efforts have been made to automate this process. Early approaches focused on knowledge-based systems (KBS) that apply rule-based mechanisms to detect code issues, providing precise feedback but struggling with complex, context-dependent cases. More recent work has shifted toward fine-tuning pre-trained language models for code review, enabling broader issue coverage but often at the expense of precision. In this paper, we propose a hybrid approach that combines the strengths of KBS and learning-based systems (LBS) to generate high-quality, comprehensive code reviews. Our method integrates knowledge at three distinct stages of the language model pipeline: during data preparation (Data-Augmented Training, DAT), at inference (Retrieval-Augmented Generation, RAG), and after inference (Naive Concatenation of Outputs, NCO). We empirically evaluate our combination strategies against standalone KBS and LBS fine-tuned on a real-world dataset. Our results show that these hybrid strategies enhance the relevance, completeness, and overall quality of review comments, effectively bridging the gap between rule-based tools and deep learning models.