TIJERE: A Novel Threat Intelligence Joint Extraction Model Based on Analyst Expert Knowledge

📅 2026-05-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

164K/year
🤖 AI Summary
Existing approaches to joint threat intelligence extraction suffer from performance limitations due to feature entanglement, linguistic ambiguity, noise propagation, and relation overlap. To address these challenges, this work proposes MSLR, a multi-sequence labeling framework that formulates joint entity and relation extraction as a multi-sequence tagging task, assigning an independent label sequence to each entity pair. The model integrates cybersecurity domain knowledge to enhance positional, contextual, and semantic representations. As part of this contribution, we introduce DNRTI-JE, the first publicly available jointly annotated cybersecurity dataset, and implement our approach using SecureBERT+. Evaluated on DNRTI-JE, the model achieves an F1 score of 0.93 for named entity recognition and a state-of-the-art F1 score of 0.98 for relation extraction, substantially outperforming existing methods.
📝 Abstract
The extraction of entities and relationships from threat intelligence reports into structured formats, such as cybersecurity knowledge graphs, is essential for automated threat analysis, detection, and mitigation. However, existing joint extraction methods struggle with feature confusion, language ambiguity, noise propagation, and overlapping relations, resulting in low accuracy and poor model performance. This paper presents TIJERE, an innovative joint entity and relation extraction framework that formulates joint extraction as a multisequence labeling representation (MSLR) problem. Specifically, separate sequences are generated for each entity pair. Unlike prior tagging schemes, MSLR integrates expert domain features to enrich positional, contextual, and semantic representations of entities, thereby enhancing feature distinction and classification accuracy. Additionally, TIJERE reduces language ambiguity and enhances domain-specific generalization by leveraging SecureBERT+, a contextual language model fine-tuned on cybersecurity text. This improves both named entity recognition (NER) and relation extraction (RE). This paper also introduces DNRTI-JE, the first publicly available jointly labeled dataset for cybersecurity entity and RE, filling a crucial gap in cyber threat intelligence automation. Empirical evaluations on the curated DNRTI-JE dataset demonstrate that TIJERE achieves state-of-the-art performance, with F1-scores exceeding 0.93 for NER and 0.98 for RE, outperforming existing methods. Together, TIJERE and the standardized benchmarking DNRTI-JE dataset enable high-performance cybersecurity intelligence extraction, with transferable applications in healthcare, finance, and bioinformatics.
Problem

Research questions and friction points this paper is trying to address.

threat intelligence
joint extraction
entity recognition
relation extraction
cybersecurity
Innovation

Methods, ideas, or system contributions that make the work stand out.

joint extraction
multisequence labeling representation
expert knowledge integration
SecureBERT+
cybersecurity knowledge graph