🤖 AI Summary
Existing provenance-based intrusion detection systems (PIDS) suffer from severe knowledge fragmentation, necessitating substantial manual intervention and impeding fully automated endpoint threat analysis. To address this, we propose a knowledge-driven provenance-based intrusion detection framework. First, we introduce a systematic taxonomy of PIDS based on knowledge types—the first of its kind. Second, we establish a multi-source knowledge fusion paradigm that unifies attack representation, threat intelligence, benign behavior distillation, and provenance graph semantic parsing—effectively bridging isolated knowledge silos. Third, we design the first large language model (LLM)-native detection framework explicitly tailored for provenance graphs. Our approach achieves significant performance gains over state-of-the-art methods on public benchmarks. Furthermore, we open-source OmniSec—a fully reproducible system enabling integrated detection, attribution, and explainability.
📝 Abstract
Recently, provenance-based intrusion detection systems (PIDSes) have been widely proposed for endpoint threat analysis. However, due to the lack of systematic integration and utilization of knowledge, existing PIDSes still require significant manual intervention for practical deployment, making full automation challenging. This paper presents a disruptive innovation by categorizing PIDSes according to the types of knowledge they utilize. In response to the prevalent issue of ``knowledge silos problem'' in existing research, we introduce a novel knowledge-driven provenance-based intrusion detection framework, powered by large language models (LLMs). We also present OmniSec, a best practice system built upon this framework. By integrating attack representation knowledge, threat intelligence knowledge, and benign behavior knowledge, OmniSec outperforms the state-of-the-art approaches on public benchmark datasets. OmniSec is available online at https://anonymous.4open.science/r/PIDS-with-LLM-613B.