Scoring the Unscorables: Cyber Risk Assessment Beyond Internet Scans

📅 2025-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Small and medium-sized enterprises (SMEs) face challenges in quantifying data breach risk due to the absence of publicly exposed, scannable IP addresses. Method: This study proposes a novel, scan-free risk assessment paradigm leveraging publicly accessible technology signatures—such as CMS, JavaScript libraries, and HTTP headers—extracted from organizational websites. It systematically validates the strong correlation between website technology stacks and cybersecurity posture, and develops XGBoost/LightGBM prediction models trained on multi-source security incident data. Contribution/Results: The models achieve an AUC exceeding 0.89 and demonstrate robust generalization across multiple independent datasets. They identify 12 technology-stack combinations significantly associated with ransomware attacks. Applied to over two million SMEs, the approach overcomes traditional scanning limitations, enabling low-cost, large-scale, and high-accuracy risk quantification.

Technology Category

Application Category

📝 Abstract
In this paper we present a study on using novel data types to perform cyber risk quantification by estimating the likelihood of a data breach. We demonstrate that it is feasible to build a highly accurate cyber risk assessment model using public and readily available technology signatures obtained from crawling an organization's website. This approach overcomes the limitations of previous similar approaches that relied on large-scale IP address based scanning data, which suffers from incomplete/missing IP address mappings as well as the lack of such data for large numbers of small and medium-sized organizations (SMEs). In comparison to scan data, technology digital signature data is more readily available for millions of SMEs. Our study shows that there is a strong relationship between these technology signatures and an organization's cybersecurity posture. In cross-validating our model using different cyber incident datasets, we also highlight the key differences between ransomware attack victims and the larger population of cyber incident and data breach victims.
Problem

Research questions and friction points this paper is trying to address.

Estimating data breach likelihood using novel data types
Overcoming limitations of IP-based scanning for SMEs
Linking technology signatures to cybersecurity posture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses public website technology signatures
Overcomes IP scan data limitations
Links signatures to cybersecurity posture
🔎 Similar Papers
No similar papers found.