Conflicting Scores, Confusing Signals: An Empirical Study of Vulnerability Scoring Systems

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Existing vulnerability scoring systems—CVSS, SSVC, EPSS, and the Exploitability Index—exhibit substantial inconsistency in assessing the same vulnerabilities due to divergent objectives and methodologies, undermining the reliability of risk prioritization. Method: This study conducts the first large-scale empirical comparison of these four major frameworks using real-world vulnerability data coupled with observed patching behavior. We employ statistical analysis and classification performance metrics to evaluate their effectiveness in vulnerability triage and exploit-risk prediction. Contribution/Results: Results reveal low inter-system ranking consistency; EPSS achieves superior predictive accuracy for actual exploitation likelihood, whereas CVSS base scores show weak correlation with real-world patching urgency. The study highlights the critical need for cross-framework score alignment and proposes concrete improvements—enhancing transparency, interpretability, and contextual adaptability—to support data-driven vulnerability management decisions. Findings provide both empirical evidence and methodological guidance for refining operational risk prioritization.

Technology Category

Application Category

📝 Abstract

Accurately assessing software vulnerabilities is essential for effective prioritization and remediation. While various scoring systems exist to support this task, their differing goals, methodologies and outputs often lead to inconsistent prioritization decisions. This work provides the first large-scale, outcome-linked empirical comparison of four publicly available vulnerability scoring systems: the Common Vulnerability Scoring System (CVSS), the Stakeholder-Specific Vulnerability Categorization (SSVC), the Exploit Prediction Scoring System (EPSS), and the Exploitability Index. We use a dataset of 600 real-world vulnerabilities derived from four months of Microsoft's Patch Tuesday disclosures to investigate the relationships between these scores, evaluate how they support vulnerability management task, how these scores categorize vulnerabilities across triage tiers, and assess their ability to capture the real-world exploitation risk. Our findings reveal significant disparities in how scoring systems rank the same vulnerabilities, with implications for organizations relying on these metrics to make data-driven, risk-based decisions. We provide insights into the alignment and divergence of these systems, highlighting the need for more transparent and consistent exploitability, risk, and severity assessments.

Problem

Research questions and friction points this paper is trying to address.

Comparing vulnerability scoring systems for inconsistent prioritization decisions

Assessing scoring systems' ability to capture real-world exploitation risk

Evaluating alignment and divergence across four major scoring methodologies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Empirical comparison of four vulnerability scoring systems

Dataset of 600 real-world Microsoft vulnerabilities

Analysis of scoring disparities and prioritization inconsistencies

🔎 Similar Papers

No similar papers found.