SoK: Knowledge is All You Need: Last Mile Delivery for Automated Provenance-based Intrusion Detection with LLMs

📅 2025-03-05

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Existing provenance-based intrusion detection systems (PIDS) suffer from severe knowledge fragmentation, necessitating substantial manual intervention and impeding fully automated endpoint threat analysis. To address this, we propose a knowledge-driven provenance-based intrusion detection framework. First, we introduce a systematic taxonomy of PIDS based on knowledge types—the first of its kind. Second, we establish a multi-source knowledge fusion paradigm that unifies attack representation, threat intelligence, benign behavior distillation, and provenance graph semantic parsing—effectively bridging isolated knowledge silos. Third, we design the first large language model (LLM)-native detection framework explicitly tailored for provenance graphs. Our approach achieves significant performance gains over state-of-the-art methods on public benchmarks. Furthermore, we open-source OmniSec—a fully reproducible system enabling integrated detection, attribution, and explainability.

Technology Category

Application Category

📝 Abstract

Recently, provenance-based intrusion detection systems (PIDSes) have been widely proposed for endpoint threat analysis. However, due to the lack of systematic integration and utilization of knowledge, existing PIDSes still require significant manual intervention for practical deployment, making full automation challenging. This paper presents a disruptive innovation by categorizing PIDSes according to the types of knowledge they utilize. In response to the prevalent issue of ``knowledge silos problem'' in existing research, we introduce a novel knowledge-driven provenance-based intrusion detection framework, powered by large language models (LLMs). We also present OmniSec, a best practice system built upon this framework. By integrating attack representation knowledge, threat intelligence knowledge, and benign behavior knowledge, OmniSec outperforms the state-of-the-art approaches on public benchmark datasets. OmniSec is available online at https://anonymous.4open.science/r/PIDS-with-LLM-613B.

Problem

Research questions and friction points this paper is trying to address.

Addresses lack of systematic knowledge integration in PIDSes.

Introduces knowledge-driven framework using large language models.

Proposes OmniSec for automated, superior intrusion detection.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes large language models for intrusion detection

Integrates attack, threat, and benign behavior knowledge

Introduces OmniSec system for automated threat analysis

🔎 Similar Papers

No similar papers found.

ByteDance

圣何塞

Machine Learning Engineer - Agentic AI

Apple

Sunnyvale, United States of America

Authors to Follow