🤖 AI Summary
Existing research on automated TTP (Tactics, Techniques, and Procedures) extraction lacks a unified framework, resulting in fragmented methodologies, inconsistent evaluations, and poor reproducibility. This study presents the first multidimensional systematic review of the field, analyzing 80 peer-reviewed publications with respect to extraction objectives, data sources, modeling approaches—including rule-based systems, traditional machine learning, Transformer architectures, and large language models—and evaluation metrics. The analysis reveals a predominant focus on technical-level single-label classification, while tactical categorization and technique retrieval remain significantly underexplored. Furthermore, narrow dataset coverage and frequent absence of publicly available code hinder model generalization and reproducibility. By synthesizing current practices and identifying critical gaps, this work provides a clear roadmap to guide future research in automated TTP extraction.
📝 Abstract
Adversaries continuously evolve their tactics, techniques, and procedures (TTPs) to achieve their objectives while evading detection, requiring defenders to continually update their understanding of adversary behavior. Prior research has proposed automated extraction of TTP-related intelligence from unstructured text and mapping it to structured knowledge bases, such as MITRE ATT&CK. However, existing work varies widely in extraction objectives, datasets, modeling approaches, and evaluation practices, making it difficult to understand the research landscape. The goal of this study is to aid security researchers in understanding the state of the art in extracting attack tactics, techniques, and procedures (TTPs) from unstructured text by analyzing relevant literature. We systematically analyze 80 peer-reviewed studies across key dimensions: extraction purposes, data sources, dataset construction, modeling approaches, evaluation metrics, and artifact availability. Our analysis reveals several dominant trends. Technique-level classification remains the dominant task formulation, while tactic classification and technique searching are underexplored. The field has progressed from rule-based and traditional machine learning to transformer-based architectures (e.g., BERT, SecureBERT, RoBERTa), with recent studies exploring LLM-based approaches including prompting, retrieval-augmented generation, and fine-tuning, though adoption remains emergent. Despite these advances, important limitations persist: many studies rely on single-label classification, limited evaluation settings, and narrow datasets, constraining cross-domain generalization. Reproducibility is further hindered by proprietary datasets, limited code releases, and restricted corpora.