π€ AI Summary
Existing LLM-based code review methods lack security-specific focus, suffering from scarce security-relevant training data, absence of security-aware evaluation metrics, and severe hallucination. This paper proposes a security-aware code review framework addressing these challenges: (1) constructing the first high-quality, security-focused annotated dataset for code review; (2) designing a security-knowledge-enhanced fine-tuning strategy that integrates domain-specific rules and vulnerability patterns; (3) incorporating a retrieval-augmented generation (RAG) mechanism to improve factual consistency and mitigate hallucination; and (4) introducing SecureBLEUβa dedicated evaluation metric quantifying review quality along three dimensions: security relevance, actionability, and accuracy. Experiments demonstrate significant improvements over state-of-the-art baselines: +18.7% in security vulnerability detection rate and +22.3% in review practicality.
π Abstract
Identifying and addressing security issues during the early phase of the development lifecycle is critical for mitigating the long-term negative impacts on software systems. Code review serves as an effective practice that enables developers to check their teammates' code before integration into the codebase. To streamline the generation of review comments, various automated code review approaches have been proposed, where LLM-based methods have significantly advanced the capabilities of automated review generation. However, existing models primarily focus on general-purpose code review, their effectiveness in identifying and addressing security-related issues remains underexplored. Moreover, adapting existing code review approaches to target security issues faces substantial challenges, including data scarcity and inadequate evaluation metrics. To address these limitations, we propose SecureReviewer, a new approach designed for enhancing LLMs' ability to identify and resolve security-related issues during code review. Specifically, we first construct a dataset tailored for training and evaluating secure code review capabilities. Leveraging this dataset, we fine-tune LLMs to generate code review comments that can effectively identify security issues and provide fix suggestions with our proposed secure-aware fine-tuning strategy. To mitigate hallucination in LLMs and enhance the reliability of their outputs, we integrate the RAG technique, which grounds the generated comments in domain-specific security knowledge. Additionally, we introduce SecureBLEU, a new evaluation metric designed to assess the effectiveness of review comments in addressing security issues. Experimental results demonstrate that SecureReviewer outperforms state-of-the-art baselines in both security issue detection accuracy and the overall quality and practical utility of generated review comments.