OntoLogX: Ontology-Guided Knowledge Graph Extraction from Cybersecurity Logs with Large Language Models

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of unstructured cybersecurity logs, semantic inconsistency, and fragmented cross-device sessions, this paper proposes OntoLogX, an intelligent agent framework that enables end-to-end, semantically consistent threat knowledge graph construction and high-level attack intent inference. Methodologically, OntoLogX integrates lightweight log ontology modeling, retrieval-augmented generation (RAG), and an iterative refinement mechanism, leveraging large language models (LLMs) to transform unstructured logs into ontology-aligned knowledge graphs. It further supports session-level behavioral aggregation and mapping to MITRE ATT&CK tactics. Evaluated on public benchmarks and real-world honeypot datasets, OntoLogX achieves significant improvements in knowledge extraction accuracy and recall, enabling precise cross-device semantic alignment and generating actionable, tactic-level threat intelligence.

Technology Category

Application Category

📝 Abstract
System logs represent a valuable source of Cyber Threat Intelligence (CTI), capturing attacker behaviors, exploited vulnerabilities, and traces of malicious activity. Yet their utility is often limited by lack of structure, semantic inconsistency, and fragmentation across devices and sessions. Extracting actionable CTI from logs therefore requires approaches that can reconcile noisy, heterogeneous data into coherent and interoperable representations. We introduce OntoLogX, an autonomous Artificial Intelligence (AI) agent that leverages Large Language Models (LLMs) to transform raw logs into ontology-grounded Knowledge Graphs (KGs). OntoLogX integrates a lightweight log ontology with Retrieval Augmented Generation (RAG) and iterative correction steps, ensuring that generated KGs are syntactically and semantically valid. Beyond event-level analysis, the system aggregates KGs into sessions and employs a LLM to predict MITRE ATT&CK tactics, linking low-level log evidence to higher-level adversarial objectives. We evaluate OntoLogX on both logs from a public benchmark and a real-world honeypot dataset, demonstrating robust KG generation across multiple KGs backends and accurate mapping of adversarial activity to ATT&CK tactics. Results highlight the benefits of retrieval and correction for precision and recall, the effectiveness of code-oriented models in structured log analysis, and the value of ontology-grounded representations for actionable CTI extraction.
Problem

Research questions and friction points this paper is trying to address.

Extracting structured threat intelligence from unstructured cybersecurity logs
Overcoming semantic inconsistencies across fragmented log data sources
Transforming raw logs into ontology-grounded knowledge graphs automatically
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs to convert logs into ontology-based knowledge graphs
Integrates RAG with iterative correction for valid graphs
Aggregates graphs to predict MITRE ATT&CK tactics
🔎 Similar Papers
No similar papers found.