Streamlining Security Vulnerability Triage with Large Language Models

šŸ“… 2025-01-31
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
To address the inefficiency and high error rate of manual vulnerability triage, this paper proposes an automated vulnerability analysis method leveraging large language models (LLMs), specifically GPT. We introduce CASEY, a novel multi-level contextual fusion prompting framework that jointly models CWE classification and severity assessment—overcoming semantic understanding and cross-vulnerability generalization limitations inherent in traditional rule-based and machine learning approaches. Our method integrates multi-granularity contextual injection, constructs an enhanced NVD dataset with domain-specific curation, and employs a hybrid quantitative–qualitative evaluation strategy. On the enhanced NVD benchmark, CASEY achieves 68.0% accuracy for CWE identification, 73.6% for severity assessment, and 51.2% for the joint task—demonstrating substantial improvements in triage efficiency and decision consistency.

Technology Category

Application Category

šŸ“ Abstract
Bug triaging for security vulnerabilities is a critical part of software maintenance, ensuring that the most pressing vulnerabilities are addressed promptly to safeguard system integrity and user data. However, the process is resource-intensive and comes with challenges, including classifying software vulnerabilities, assessing their severity, and managing a high volume of bug reports. In this paper, we present CASEY, a novel approach that leverages Large Language Models (in our case, the GPT model) that automates the identification of Common Weakness Enumerations (CWEs) of security bugs and assesses their severity. CASEY employs prompt engineering techniques and incorporates contextual information at varying levels of granularity to assist in the bug triaging process. We evaluated CASEY using an augmented version of the National Vulnerability Database (NVD), employing quantitative and qualitative metrics to measure its performance across CWE identification, severity assessment, and their combined analysis. CASEY achieved a CWE identification accuracy of 68%, a severity identification accuracy of 73.6%, and a combined accuracy of 51.2% for identifying both. These results demonstrate the potential of LLMs in identifying CWEs and severity levels, streamlining software vulnerability management, and improving the efficiency of security vulnerability triaging workflows.
Problem

Research questions and friction points this paper is trying to address.

Software Security
Vulnerability Assessment
Information Protection
Innovation

Methods, ideas, or system contributions that make the work stand out.

CASEY
Language Model
Software Security Assessment
M
Mohammad Jalili Torkamani
School of Computing, University of Nebraska–Lincoln, Lincoln, USA
N
N. Joey
School of Computing, University of Nebraska–Lincoln, Lincoln, USA
Nikita Mehrotra
Nikita Mehrotra
Microsoft
Program analysisdeep learning
M
Mahinthan Chandramohan
Oracle Labs, Brisbane, Australia
P
Padmanabhan Krishnan
Oracle Labs, Brisbane, Australia
Rahul Purandare
Rahul Purandare
Associate Professor, University of Nebraska–Lincoln
Program AnalysisSoftware TestingAI for Software EngineeringCode Comprehension