🤖 AI Summary
Fine-grained classification of code review comments suffers from high annotation costs and poor performance on infrequent categories. Method: This work pioneers the application of large language models (LLMs) to zero-shot and few-shot classification of 17 review comment types, leveraging prompt engineering and category semantic enhancement to mitigate long-tail distribution challenges without requiring extensive labeled data. Contribution/Results: Our approach surpasses the state-of-the-art deep learning models in overall accuracy and achieves an average +12.3% F1-score improvement across five critical low-frequency categories—marking the first demonstration of balanced performance between high- and low-frequency classes. By significantly reducing reliance on manual annotation, this work establishes a scalable, low-resource paradigm for fine-grained software engineering text analysis.
📝 Abstract
Code review is a crucial practice in software development. As code review nowadays is lightweight, various issues can be identified, and sometimes, they can be trivial. Research has investigated automated approaches to classify review comments to gauge the effectiveness of code reviews. However, previous studies have primarily relied on supervised machine learning, which requires extensive manual annotation to train the models effectively. To address this limitation, we explore the potential of using Large Language Models (LLMs) to classify code review comments. We assess the performance of LLMs to classify 17 categories of code review comments. Our results show that LLMs can classify code review comments, outperforming the state-of-the-art approach using a trained deep learning model. In particular, LLMs achieve better accuracy in classifying the five most useful categories, which the state-of-the-art approach struggles with due to low training examples. Rather than relying solely on a specific small training data distribution, our results show that LLMs provide balanced performance across high- and low-frequency categories. These results suggest that the LLMs could offer a scalable solution for code review analytics to improve the effectiveness of the code review process.