🤖 AI Summary
Small and medium-sized enterprises (SMEs) face challenges in quantifying data breach risk due to the absence of publicly exposed, scannable IP addresses.
Method: This study proposes a novel, scan-free risk assessment paradigm leveraging publicly accessible technology signatures—such as CMS, JavaScript libraries, and HTTP headers—extracted from organizational websites. It systematically validates the strong correlation between website technology stacks and cybersecurity posture, and develops XGBoost/LightGBM prediction models trained on multi-source security incident data.
Contribution/Results: The models achieve an AUC exceeding 0.89 and demonstrate robust generalization across multiple independent datasets. They identify 12 technology-stack combinations significantly associated with ransomware attacks. Applied to over two million SMEs, the approach overcomes traditional scanning limitations, enabling low-cost, large-scale, and high-accuracy risk quantification.
📝 Abstract
In this paper we present a study on using novel data types to perform cyber risk quantification by estimating the likelihood of a data breach. We demonstrate that it is feasible to build a highly accurate cyber risk assessment model using public and readily available technology signatures obtained from crawling an organization's website. This approach overcomes the limitations of previous similar approaches that relied on large-scale IP address based scanning data, which suffers from incomplete/missing IP address mappings as well as the lack of such data for large numbers of small and medium-sized organizations (SMEs). In comparison to scan data, technology digital signature data is more readily available for millions of SMEs. Our study shows that there is a strong relationship between these technology signatures and an organization's cybersecurity posture. In cross-validating our model using different cyber incident datasets, we also highlight the key differences between ransomware attack victims and the larger population of cyber incident and data breach victims.