🤖 AI Summary
Security incident response is frequently impeded by delayed mapping between attacks and underlying vulnerabilities, and manual correlation of attack descriptions with CVE records is inherently unscalable. This paper proposes a semantic matching–based automated vulnerability linking method, employing a multi-model comparative evaluation framework centered on the *multi-qa-mpnet-base-dot-v1* sentence embedding model to achieve end-to-end semantic alignment between attack narratives and CVE entries. Evaluated on real-world attack data, the approach identifies 275 previously unrecorded vulnerability links absent from the MITRE ATT&CK knowledge base—exposing substantial coverage gaps in current threat intelligence resources. Our optimized MMPNet model achieves an F1 score of 89.0 and a recall of 94.7%; 56% of its predictions are empirically validated via co-occurrence in both CVE entries and attack logs. To our knowledge, this work represents the first practical, scalable deployment of semantic vulnerability association in operational security analytics.
📝 Abstract
In the domain of security, vulnerabilities frequently remain undetected even after their exploitation. In this work, vulnerabilities refer to publicly disclosed flaws documented in Common Vulnerabilities and Exposures (CVE) reports. Establishing a connection between attacks and vulnerabilities is essential for enabling timely incident response, as it provides defenders with immediate, actionable insights. However, manually mapping attacks to CVEs is infeasible, thereby motivating the need for automation. This paper evaluates 14 state-of-the-art (SOTA) sentence transformers for automatically identifying vulnerabilities from textual descriptions of attacks. Our results demonstrate that the multi-qa-mpnet-base-dot-v1 (MMPNet) model achieves superior classification performance when using attack Technique descriptions, with an F1-score of 89.0, precision of 84.0, and recall of 94.7. Furthermore, it was observed that, on average, 56% of the vulnerabilities identified by the MMPNet model are also represented within the CVE repository in conjunction with an attack, while 61% of the vulnerabilities detected by the model correspond to those cataloged in the CVE repository. A manual inspection of the results revealed the existence of 275 predicted links that were not documented in the MITRE repositories. Consequently, the automation of linking attack techniques to vulnerabilities not only enhances the detection and response capabilities related to software security incidents but also diminishes the duration during which vulnerabilities remain exploitable, thereby contributing to the development of more secure systems.