Context-aware Entity-Relation Extraction for Threat Intelligence Knowledge Graphs

๐Ÿ“… 2026-05-15
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

172K/year
๐Ÿค– AI Summary
This study addresses the challenges of extracting entityโ€“relation triples from unstructured cyber threat intelligence (CTI) reports to construct knowledge graphs, including domain-specific language complexity, semantic ambiguity, and error propagation. To tackle these issues, the authors propose CTiKG, an end-to-end framework that jointly optimizes threat entity recognition and relation classification by innovatively integrating context-aware SecureBERT+ embeddings with a domain ontology aligned to the STIX 2.1 standard. This approach effectively mitigates cascading errors inherent in pipeline methods. Evaluated on the DNRTI-AUG-STIX2 dataset, CTiKG achieves a 3โ€“4% improvement in F1 score for named entity recognition and up to an 8% gain in relation extraction performance. Its robustness and practicality are further validated on the DNRTI and STUCCO benchmarks.
๐Ÿ“ Abstract
Cybersecurity Knowledge Graphs (CKGs) unify diverse Cyber Threat Intelligence (CTI) sources into structured, queryable formats, offering scalable solutions for automating proactive and real-time security responses. Their increasing adoption has significantly enhanced the workflow and decision-making efficiency of security professionals. However, constructing CKGs requires extracting entity-relation triples from unstructured CTI reports, a task hindered by complex report structure, domain-specific language, and semantic ambiguity. As a result, existing pipeline-based approaches often suffer from error propagation, reducing extraction accuracy and limiting generalizability. This paper introduces the Context-aware Threat Intelligence Knowledge Graph (CTiKG) framework, a pipeline architecture designed to accurately extract and classify threat entities and their relationships from CTI reports. CTiKG incorporates hybrid NLP models that leverage SecureBERT+ contextual embeddings and expert knowledge from a domain ontology to reduce misclassifications and mitigate cascading errors. Experiments on the DNRTI-AUG-STIX2 dataset, which comprises 21 entity types aligned with STIX 2.1, demonstrate significant improvements over state-of-the-art baselines, yielding 3-4% gains in NER and up to 8% in RE performance, based on precision, recall, and F1-score. Additional validation on DNRTI and STUCCO benchmarks confirms the framework's robustness and practical applicability. All datasets, including the curated DNRTI-AUG-STIX2, are released on GitHub to foster reproducibility and further research.
Problem

Research questions and friction points this paper is trying to address.

Entity-Relation Extraction
Cyber Threat Intelligence
Knowledge Graph Construction
Semantic Ambiguity
Error Propagation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context-aware
SecureBERT+
Hybrid NLP
Threat Intelligence
Knowledge Graph