🤖 AI Summary
This study addresses the challenge of detecting sophisticated software supply chain attacks, which exhibit high stealth and leave fragmented evidence across host systems, services, and build/dependency layers, rendering single telemetry sources insufficient for full attack chain reconstruction. To this end, the authors develop SynthChain, a near-production testbed that reproduces seven representative attack scenarios—spanning PyPI, npm, and C/C++ ecosystems—using real-world malicious packages. The platform collects multi-source runtime data, including system logs, network traffic, and process behaviors, and provides fine-grained, chain-level annotations aligned with the MITRE ATT&CK framework. The work introduces the first benchmark dataset for supply chain attacks with ground-truth attack chains, comprising 580,000 raw events and 1.5 million evaluation records, and quantifies observability constraints at each attack step. Experiments demonstrate that single-source telemetry achieves at most 0.391 chain coverage, whereas minimal dual-source fusion boosts coverage and reconstruction rates to 0.636–0.639 (approximately 1.6×), substantially improving detection recall.
📝 Abstract
Advanced software supply chain (SSC) attacks are increasingly runtime-only and leave fragmented evidence across hosts, services, and build/dependency layers, so any single telemetry stream is inherently insufficient to reconstruct full compromise chains under realistic access and budget limits. We present SynthChain, a near-production testbed and a multi-source runtime dataset with chain-level ground truth, derived from real-world malicious packages and exploit campaigns. SynthChain covers seven representative supply-chain exploit scenarios across PyPI, npm, and a native C/C++ supply-chain case, spanning Windows and Linux, and involving four hosts and one containerized environment. Scenarios span realistic time windows from minutes to hours and are annotated with 14 MITRE ATT&CK tactics and 161 techniques (29-104 techniques per scenario). Beyond releasing the data, we quantify observability constraints by mapping each chain step to the minimum evidence needed for detection and cross-source correlation. With realistic trace availability, no single source is chain-complete: the best single source reaches only 0.391 weighted tag/step coverage and 0.403 mean chain reconstruction. Even minimal two-source fusion boosts coverage to 0.636 and reconstruction to 0.639 (approximately 1.6x gain), with consistent chain coverage/recall improvements (0.545). The corpus contains approximately 0.58M raw multi-source events and 1.50M evaluation rows, enabling controlled studies of detection under constrained telemetry. We release the dataset, ground truth, and artifacts to support reproducible, forensic-aware runtime defenses and to guide efficient detection for software supply chains.