Noise Contrastive Estimation-based Matching Framework for Low-Resource Security Attack Pattern Recognition

📅 2024-01-18

🏛️ Findings

📈 Citations: 5

✨ Influential: 2

career value

174K/year

🤖 AI Summary

To address the challenge of fine-grained matching between threat reports and MITRE ATT&CK TTPs in low-resource settings—characterized by numerous classes, severe label imbalance, and complex hierarchical semantics—this work departs from conventional multi-class or multi-label classification paradigms. Instead, it introduces Noise Contrastive Estimation (NCE) to TTP semantic matching for the first time, proposing a sampling-driven “learning-to-contrast” mechanism. The method adopts a dual-tower architecture integrating BERT-based text encoding, dynamic negative sampling, and NCE loss, enabling effective few-shot and zero-shot cross-domain transfer. Evaluated on real-world CTI datasets, our approach achieves a 12.6% improvement in F1-score, outperforms state-of-the-art models by 9.3% in cross-domain accuracy, and attains an inference speed of 320 samples per second—demonstrating both high precision and operational efficiency.

Technology Category

Application Category

📝 Abstract

Techniques, Tactics and Procedures (TTP) mapping is an important and difficult task in the application of cyber threat intelligence (CTI) extraction for threat reports. TTPs are typically expressed in semantic forms within security knowledge bases like MITRE ATT&CK, serving as textual high-level descriptions for sophisticated attack patterns. Conversely, attacks in CTI threat reports are detailed in a combination of natural and technical language forms, presenting a significant challenge even for security experts to establish correlations or mappings with the corresponding TTPs.Conventional learning approaches often target the TTP mapping problem in the classical multiclass/label classification setting. This setting hinders the learning capabilities of the model, due to the large number of classes (i.e., TTPs), the inevitable skewness of the label distribution and the complex hierarchical structure of the label space. In this work, we approach the problem in a different learning paradigm, such that the assignment of a text to a TTP label is essentially decided by the direct semantic similarity between the two, thus, reducing the complexity of competing solely over the large labeling space. In order that, we propose a neural matching architecture that incorporates a sampling based learn-to-compare mechanism, facilitating the learning process of the matching model despite constrained resources.

Problem

Research questions and friction points this paper is trying to address.

Recognizing TTPs in low-resource cybersecurity text

Addressing label skewness and hierarchical complexity

Improving semantic similarity-based attack pattern matching

Innovation

Methods, ideas, or system contributions that make the work stand out.

Noise Contrastive Estimation-based matching framework

Semantic similarity for TTP label assignment

Sampling-based learn-to-compare mechanism

🔎 Similar Papers

No similar papers found.