Predicting known Vulnerabilities from Attack News: A Transformer-Based Approach

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work proposes an automated approach based on semantic similarity to identify exploited, known software vulnerabilities (CVEs) from cybersecurity news, enabling timely response. The method leverages the multi-qa-mpnet-base-dot-v1 sentence Transformer model to encode attack descriptions and performs semantic retrieval to match candidate CVEs, followed by threshold-based filtering and a hybrid human-in-the-loop verification mechanism. To the best of our knowledge, this is the first study to integrate semantic similarity with Transformer-based models for linking real-world attack narratives to CVE entries. Evaluated on a test set of 100 SecurityWeek articles, the approach achieves 81% precision, with 57% of samples containing fully matched CVEs. Manual assessment further indicates that 70% of the predicted CVEs are relevant, demonstrating a significant improvement in both efficiency and accuracy for vulnerability identification in unstructured threat intelligence.

Technology Category

Application Category

📝 Abstract

Identifying the vulnerabilities exploited during cyberattacks is essential for enabling timely responses and effective mitigation in software security. This paper directly examines the process of predicting software vulnerabilities, specifically Common Vulnerabilities and Exposures (CVEs), from unstructured descriptions of attacks reported in cybersecurity news articles. We propose a semantic similarity-based approach utilizing the multi-qa-mpnet-base-dot-v1 (MPNet) sentence transformer model to generate a ranked list of the most likely CVEs corresponding to each news report. To assess the accuracy of the predicted vulnerabilities, we implement four complementary validation methods: filtering predictions based on similarity thresholds, conducting manual validation, performing semantic comparisons with the first vulnerability explicitly mentioned in each report, and comparing against all CVEs referenced within the report. Experimental results, drawn from a dataset of 100 SecurityWeek news articles, demonstrate that the model attains a precision of 81 percent when employing threshold-based filtering. Manual evaluations report that 70 percent of the predictions are relevant, while comparisons with the initially mentioned CVEs reveal agreement rates of 80 percent with the first listed vulnerability and 78 percent across all referenced CVEs. In 57 percent of the news reports analyzed, at least one predicted vulnerability precisely matched a CVE-ID mentioned in the article. These findings underscore the model's potential to facilitate automated vulnerability identification from real-world cyberattack news reports.

Problem

Research questions and friction points this paper is trying to address.

vulnerability prediction

CVE

cybersecurity news

attack reports

software security

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based vulnerability prediction

semantic similarity

CVE identification