🤖 AI Summary
To address the challenge of detecting sophisticated, runtime-hidden Advanced Persistent Threats (APTs) in supply chains where source code is unavailable, this paper proposes a real-time detection framework coupling dynamic provenance graphs with a distributed temporal Graph Neural Network (DyGNN). Methodologically, it integrates heterogeneous, kernel-level multi-source logs via eBPF to construct dynamic provenance graphs and introduces an attack-replay-driven, customized data synthesis mechanism—filling a critical gap in domain-specific APT datasets. The key contributions are: (1) the first integration of dynamic provenance graphs with distributed spatiotemporal graph learning for runtime APT detection in source-code-unavailable settings; and (2) achieving 98.7% detection accuracy, an average response latency of 1.3 seconds, and a 62% reduction in false positives over baseline methods, while enabling fine-grained, process-level attack attribution.
📝 Abstract
Cyber supply chain, encompassing digital asserts, software, hardware, has become an essential component of modern Information and Communications Technology (ICT) provisioning. However, the growing inter-dependencies have introduced numerous attack vectors, making supply chains a prime target for exploitation. In particular, advanced persistent threats (APTs) frequently leverage supply chain vulnerabilities (SCVs) as entry points, benefiting from their inherent stealth. Current defense strategies primarly focus on prevention through blockchain for integrity assurance or detection using plain-text source code analysis in open-source software (OSS). However, these approaches overlook scenarios where source code is unavailable and fail to address detection and defense during runtime. To bridge this gap, we propose a novel approach that integrates multi-source data, constructs a comprehensive dynamic provenance graph, and detects APT behavior in real time using temporal graph learning. Given the lack of tailored datasets in both industry and academia, we also aim to simulate a custom dataset by replaying real-world supply chain exploits with multi-source monitoring.