Can LLMs Classify CVEs? Investigating LLMs Capabilities in Computing CVSS Vectors

📅 2025-04-14

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This study addresses the subjectivity and poor timeliness of manual CVSS scoring by conducting the first systematic evaluation of large language models (LLMs) for automated CVSS scoring across all three metric categories—Base, Temporal, and Environmental. We propose a multi-strategy prompt engineering framework to optimize LLM generation of CVSS vectors and benchmark it against an embedding-based supervised classification model. Results show that LLMs achieve high accuracy on objective metrics (e.g., Attack Vector, Privileges Required), but significantly underperform on subjective dimensions—Confidentiality, Integrity, and Availability—where the embedding model excels. A hybrid approach combining both methods improves consistency and reliability across all CVSS dimensions. This work establishes a novel paradigm for automated vulnerability severity assessment, advancing CVSS evaluation toward greater efficiency, reproducibility, and scalability.

Technology Category

Application Category

📝 Abstract

Common Vulnerability and Exposure (CVE) records are fundamental to cybersecurity, offering unique identifiers for publicly known software and system vulnerabilities. Each CVE is typically assigned a Common Vulnerability Scoring System (CVSS) score to support risk prioritization and remediation. However, score inconsistencies often arise due to subjective interpretations of certain metrics. As the number of new CVEs continues to grow rapidly, automation is increasingly necessary to ensure timely and consistent scoring. While prior studies have explored automated methods, the application of Large Language Models (LLMs), despite their recent popularity, remains relatively underexplored. In this work, we evaluate the effectiveness of LLMs in generating CVSS scores for newly reported vulnerabilities. We investigate various prompt engineering strategies to enhance their accuracy and compare LLM-generated scores against those from embedding-based models, which use vector representations classified via supervised learning. Our results show that while LLMs demonstrate potential in automating CVSS evaluation, embedding-based methods outperform them in scoring more subjective components, particularly confidentiality, integrity, and availability impacts. These findings underscore the complexity of CVSS scoring and suggest that combining LLMs with embedding-based methods could yield more reliable results across all scoring components.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' ability to classify CVEs and generate CVSS scores

Exploring prompt engineering to improve LLM accuracy in CVSS scoring

Comparing LLM performance with embedding-based models for vulnerability scoring

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate CVSS scores for vulnerabilities

Prompt engineering enhances LLM scoring accuracy

Embedding-based methods outperform LLMs in subjective scoring

🔎 Similar Papers

No similar papers found.