🤖 AI Summary
To address low vulnerability identification efficiency and the neglect of formatting information in rich-text issue reports for open-source software, this paper proposes a reasoning-guided vulnerability identification method. It integrates the logical reasoning capabilities of large language models (LLMs) with structured rich-text analysis to construct a vulnerability reasoning database and dynamically generate reasoning guidance via historical case retrieval. This work is the first to achieve synergistic modeling of LLM-based reasoning and rich-text features, significantly improving identification performance on imbalanced data. Experiments on 970,000 issue reports demonstrate improvements of 11.0% in F1-score, 10.5% in Macro-F1, and 20.2% in AUPRC, while halving inference time. In real-world deployment, the method successfully identified 30 emerging vulnerabilities, 11 of which received official CVE assignments.
📝 Abstract
Software vulnerabilities exist in open-source software (OSS), and the developers who discover these vulnerabilities may submit issue reports (IRs) to describe their details. Security practitioners need to spend a lot of time manually identifying vulnerability-related IRs from the community, and the time gap may be exploited by attackers to harm the system. Previously, researchers have proposed automatic approaches to facilitate identifying these vulnerability-related IRs, but these works focus on textual descriptions but lack the comprehensive analysis of IR's rich-text information. In this paper, we propose VulRTex, a reasoning-guided approach to identify vulnerability-related IRs with their rich-text information. In particular, VulRTex first utilizes the reasoning ability of the Large Language Model (LLM) to prepare the Vulnerability Reasoning Database with historical IRs. Then, it retrieves the relevant cases from the prepared reasoning database to generate reasoning guidance, which guides LLM to identify vulnerabilities by reasoning analysis on target IRs' rich-text information. To evaluate the performance of VulRTex, we conduct experiments on 973,572 IRs, and the results show that VulRTex achieves the highest performance in identifying the vulnerability-related IRs and predicting CWE-IDs when the dataset is imbalanced, outperforming the best baseline with +11.0% F1, +20.2% AUPRC, and +10.5% Macro-F1, and 2x lower time cost than baseline reasoning approaches. Furthermore, VulRTex has been applied to identify 30 emerging vulnerabilities across 10 representative OSS projects in 2024's GitHub IRs, and 11 of them are successfully assigned CVE-IDs, which illustrates VulRTex's practicality.