🤖 AI Summary
This study addresses the lack of systematic evaluation of provenance-based intrusion detection systems (PIDS) in industrial settings. For the first time, we construct a multi-source heterogeneous dataset from real-world industrial environments and simulate advanced persistent threat (APT) attacks to comprehensively benchmark five state-of-the-art PIDS approaches. Our evaluation reveals three critical characteristics and challenges unique to industrial scenarios: poor cross-platform portability, low true-positive detection rates for real attacks, and high false-positive rates. To mitigate the latter, we propose a novel method that substantially reduces false alerts by approximately two-thirds, thereby decreasing the manual verification workload proportionally. Furthermore, our findings offer several actionable directions for optimizing PIDS deployment and performance in industrial contexts.
📝 Abstract
Provenance-based Intrusion Detection Systems (PIDSes) have been widely used to detect Advanced Persistent Threats (APTs). Although many studies achieve high performance in the evaluations of their original papers, their performance in industrial scenarios remains unclear. To fill this gap, we conduct the first systematic evaluation and analysis of PIDSes in industrial scenarios. We first analyze the differences between the data from DARPA datasets and that collected in industrial scenarios, identifying three main new characteristics in industry: heterogeneous multi-source inputs, more powerful attackers, and increasing benign activity complexity. We then build several datasets to evaluate five state-of-the-art PIDSes. The evaluation results reveal challenges for existing PIDSes, including poor portability across different hosts and platforms, low detection performance against real-world attacks, and high false positive rates with ever-changing benign activities. Based on the evaluation results and our industrial practices, we provide several insights to solve or explain the above problems. For example, we propose a method to mitigate the high false positives, which reduces manual effort by 2/3. Finally, we propose several research suggestions to improve PIDSes.