🤖 AI Summary
Existing automated secure code review methods suffer from limitations in precision, coverage, and evaluation completeness. To address these challenges, this paper proposes a large language model (LLM)-based hybrid prompting architecture that innovatively integrates a multi-expert prompting mechanism with a feature-driven dynamic routing algorithm. The algorithm selects the most suitable expert prompt in real time based on code semantics and security-relevant features, thereby mitigating hallucination-induced false positives. Evaluated on an internal dataset, the method achieves an F1 score of 63.98% and attains an 84% acceptance rate for generated review comments in production environments. It significantly improves detection rate, precision, and practical utility for security vulnerabilities. By enabling interpretable and robust decision-making, the approach establishes a novel paradigm for explainable, high-robustness secure code review.
📝 Abstract
Code review is an essential process to ensure the quality of software that identifies potential software issues at an early stage of software development. Among all software issues, security issues are the most important to identify, as they can easily lead to severe software crashes and service disruptions. Recent research efforts have been devoted to automated approaches to reduce the manual efforts required in the secure code review process. Despite the progress, current automated approaches on secure code review, including static analysis, deep learning models, and prompting approaches, still face the challenges of limited precision and coverage, and a lack of comprehensive evaluation.
To mitigate these challenges, we propose iCodeReviewer, which is an automated secure code review approach based on large language models (LLMs). iCodeReviewer leverages a novel mixture-of-prompts architecture that incorporates many prompt experts to improve the coverage of security issues. Each prompt expert is a dynamic prompt pipeline to check the existence of a specific security issue. iCodeReviewer also implements an effective routing algorithm to activate only necessary prompt experts based on the code features in the input program, reducing the false positives induced by LLM hallucination. Experiment results in our internal dataset demonstrate the effectiveness of iCodeReviewer in security issue identification and localization with an F1 of 63.98%. The review comments generated by iCodeReviewer also achieve a high acceptance rate up to 84% when it is deployed in production environments.