From Attack Descriptions to Vulnerabilities: A Sentence Transformer-Based Approach

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Security incident response is frequently impeded by delayed mapping between attacks and underlying vulnerabilities, and manual correlation of attack descriptions with CVE records is inherently unscalable. This paper proposes a semantic matching–based automated vulnerability linking method, employing a multi-model comparative evaluation framework centered on the *multi-qa-mpnet-base-dot-v1* sentence embedding model to achieve end-to-end semantic alignment between attack narratives and CVE entries. Evaluated on real-world attack data, the approach identifies 275 previously unrecorded vulnerability links absent from the MITRE ATT&CK knowledge base—exposing substantial coverage gaps in current threat intelligence resources. Our optimized MMPNet model achieves an F1 score of 89.0 and a recall of 94.7%; 56% of its predictions are empirically validated via co-occurrence in both CVE entries and attack logs. To our knowledge, this work represents the first practical, scalable deployment of semantic vulnerability association in operational security analytics.

Technology Category

Application Category

📝 Abstract

In the domain of security, vulnerabilities frequently remain undetected even after their exploitation. In this work, vulnerabilities refer to publicly disclosed flaws documented in Common Vulnerabilities and Exposures (CVE) reports. Establishing a connection between attacks and vulnerabilities is essential for enabling timely incident response, as it provides defenders with immediate, actionable insights. However, manually mapping attacks to CVEs is infeasible, thereby motivating the need for automation. This paper evaluates 14 state-of-the-art (SOTA) sentence transformers for automatically identifying vulnerabilities from textual descriptions of attacks. Our results demonstrate that the multi-qa-mpnet-base-dot-v1 (MMPNet) model achieves superior classification performance when using attack Technique descriptions, with an F1-score of 89.0, precision of 84.0, and recall of 94.7. Furthermore, it was observed that, on average, 56% of the vulnerabilities identified by the MMPNet model are also represented within the CVE repository in conjunction with an attack, while 61% of the vulnerabilities detected by the model correspond to those cataloged in the CVE repository. A manual inspection of the results revealed the existence of 275 predicted links that were not documented in the MITRE repositories. Consequently, the automation of linking attack techniques to vulnerabilities not only enhances the detection and response capabilities related to software security incidents but also diminishes the duration during which vulnerabilities remain exploitable, thereby contributing to the development of more secure systems.

Problem

Research questions and friction points this paper is trying to address.

Automating mapping between attack descriptions and vulnerability identification

Evaluating sentence transformers for CVE detection from text

Reducing manual effort in linking security incidents to vulnerabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses sentence transformers for vulnerability identification

Evaluates 14 SOTA models for attack description analysis

MMPNet model achieves 89.0 F1-score performance

🔎 Similar Papers

Towards Effectively Detecting and Explaining Vulnerabilities Using Large Language Models