Conflicting Scores, Confusing Signals: An Empirical Study of Vulnerability Scoring Systems

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vulnerability scoring systems—CVSS, SSVC, EPSS, and the Exploitability Index—exhibit substantial inconsistency in assessing the same vulnerabilities due to divergent objectives and methodologies, undermining the reliability of risk prioritization. Method: This study conducts the first large-scale empirical comparison of these four major frameworks using real-world vulnerability data coupled with observed patching behavior. We employ statistical analysis and classification performance metrics to evaluate their effectiveness in vulnerability triage and exploit-risk prediction. Contribution/Results: Results reveal low inter-system ranking consistency; EPSS achieves superior predictive accuracy for actual exploitation likelihood, whereas CVSS base scores show weak correlation with real-world patching urgency. The study highlights the critical need for cross-framework score alignment and proposes concrete improvements—enhancing transparency, interpretability, and contextual adaptability—to support data-driven vulnerability management decisions. Findings provide both empirical evidence and methodological guidance for refining operational risk prioritization.

Technology Category

Application Category

📝 Abstract
Accurately assessing software vulnerabilities is essential for effective prioritization and remediation. While various scoring systems exist to support this task, their differing goals, methodologies and outputs often lead to inconsistent prioritization decisions. This work provides the first large-scale, outcome-linked empirical comparison of four publicly available vulnerability scoring systems: the Common Vulnerability Scoring System (CVSS), the Stakeholder-Specific Vulnerability Categorization (SSVC), the Exploit Prediction Scoring System (EPSS), and the Exploitability Index. We use a dataset of 600 real-world vulnerabilities derived from four months of Microsoft's Patch Tuesday disclosures to investigate the relationships between these scores, evaluate how they support vulnerability management task, how these scores categorize vulnerabilities across triage tiers, and assess their ability to capture the real-world exploitation risk. Our findings reveal significant disparities in how scoring systems rank the same vulnerabilities, with implications for organizations relying on these metrics to make data-driven, risk-based decisions. We provide insights into the alignment and divergence of these systems, highlighting the need for more transparent and consistent exploitability, risk, and severity assessments.
Problem

Research questions and friction points this paper is trying to address.

Comparing vulnerability scoring systems for inconsistent prioritization decisions
Assessing scoring systems' ability to capture real-world exploitation risk
Evaluating alignment and divergence across four major scoring methodologies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Empirical comparison of four vulnerability scoring systems
Dataset of 600 real-world Microsoft vulnerabilities
Analysis of scoring disparities and prioritization inconsistencies
🔎 Similar Papers
No similar papers found.