Streamlining Security Vulnerability Triage with Large Language Models

📅 2025-01-31

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address the inefficiency and high error rate of manual vulnerability triage, this paper proposes an automated vulnerability analysis method leveraging large language models (LLMs), specifically GPT. We introduce CASEY, a novel multi-level contextual fusion prompting framework that jointly models CWE classification and severity assessment—overcoming semantic understanding and cross-vulnerability generalization limitations inherent in traditional rule-based and machine learning approaches. Our method integrates multi-granularity contextual injection, constructs an enhanced NVD dataset with domain-specific curation, and employs a hybrid quantitative–qualitative evaluation strategy. On the enhanced NVD benchmark, CASEY achieves 68.0% accuracy for CWE identification, 73.6% for severity assessment, and 51.2% for the joint task—demonstrating substantial improvements in triage efficiency and decision consistency.

Technology Category

Application Category

📝 Abstract

Bug triaging for security vulnerabilities is a critical part of software maintenance, ensuring that the most pressing vulnerabilities are addressed promptly to safeguard system integrity and user data. However, the process is resource-intensive and comes with challenges, including classifying software vulnerabilities, assessing their severity, and managing a high volume of bug reports. In this paper, we present CASEY, a novel approach that leverages Large Language Models (in our case, the GPT model) that automates the identification of Common Weakness Enumerations (CWEs) of security bugs and assesses their severity. CASEY employs prompt engineering techniques and incorporates contextual information at varying levels of granularity to assist in the bug triaging process. We evaluated CASEY using an augmented version of the National Vulnerability Database (NVD), employing quantitative and qualitative metrics to measure its performance across CWE identification, severity assessment, and their combined analysis. CASEY achieved a CWE identification accuracy of 68%, a severity identification accuracy of 73.6%, and a combined accuracy of 51.2% for identifying both. These results demonstrate the potential of LLMs in identifying CWEs and severity levels, streamlining software vulnerability management, and improving the efficiency of security vulnerability triaging workflows.

Problem

Research questions and friction points this paper is trying to address.

Software Security

Vulnerability Assessment

Information Protection

Innovation

Methods, ideas, or system contributions that make the work stand out.

CASEY

Language Model

Software Security Assessment

🔎 Similar Papers

An Insight into Security Code Review with LLMs: Capabilities, Obstacles and Influential Factors