SecureReviewer: Enhancing Large Language Models for Secure Code Review through Secure-aware Fine-tuning

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing LLM-based code review methods lack security-specific focus, suffering from scarce security-relevant training data, absence of security-aware evaluation metrics, and severe hallucination. This paper proposes a security-aware code review framework addressing these challenges: (1) constructing the first high-quality, security-focused annotated dataset for code review; (2) designing a security-knowledge-enhanced fine-tuning strategy that integrates domain-specific rules and vulnerability patterns; (3) incorporating a retrieval-augmented generation (RAG) mechanism to improve factual consistency and mitigate hallucination; and (4) introducing SecureBLEU—a dedicated evaluation metric quantifying review quality along three dimensions: security relevance, actionability, and accuracy. Experiments demonstrate significant improvements over state-of-the-art baselines: +18.7% in security vulnerability detection rate and +22.3% in review practicality.

Technology Category

Application Category

📝 Abstract

Identifying and addressing security issues during the early phase of the development lifecycle is critical for mitigating the long-term negative impacts on software systems. Code review serves as an effective practice that enables developers to check their teammates' code before integration into the codebase. To streamline the generation of review comments, various automated code review approaches have been proposed, where LLM-based methods have significantly advanced the capabilities of automated review generation. However, existing models primarily focus on general-purpose code review, their effectiveness in identifying and addressing security-related issues remains underexplored. Moreover, adapting existing code review approaches to target security issues faces substantial challenges, including data scarcity and inadequate evaluation metrics. To address these limitations, we propose SecureReviewer, a new approach designed for enhancing LLMs' ability to identify and resolve security-related issues during code review. Specifically, we first construct a dataset tailored for training and evaluating secure code review capabilities. Leveraging this dataset, we fine-tune LLMs to generate code review comments that can effectively identify security issues and provide fix suggestions with our proposed secure-aware fine-tuning strategy. To mitigate hallucination in LLMs and enhance the reliability of their outputs, we integrate the RAG technique, which grounds the generated comments in domain-specific security knowledge. Additionally, we introduce SecureBLEU, a new evaluation metric designed to assess the effectiveness of review comments in addressing security issues. Experimental results demonstrate that SecureReviewer outperforms state-of-the-art baselines in both security issue detection accuracy and the overall quality and practical utility of generated review comments.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs for security-focused code review through specialized fine-tuning

Addressing data scarcity and inadequate metrics in secure code evaluation

Mitigating LLM hallucinations in security reviews using RAG integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned LLMs with secure-aware strategy

Integrated RAG for domain-specific security knowledge

Introduced SecureBLEU metric for security assessment

🔎 Similar Papers

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?