🤖 AI Summary
Existing PyPI supply-chain attack datasets inadequately capture multi-stage execution, remote activation, and dynamically loaded payloads; reliance on static analysis leads to post-installation malicious behavior being overlooked. Method: We construct the first full-lifecycle dynamic analysis dataset for the PyPI ecosystem, comprising 14,271 Python packages—including 7,127 malicious samples—and capturing 36-dimensional real-time behavioral traces (e.g., system calls, network flows, resource usage). We innovatively enhance sandboxing with eBPF to enable fine-grained, kernel-level monitoring across package installation, initialization, and runtime phases. Contribution/Results: Leveraging this dataset, we identified and facilitated the removal of four high-download, stealthy malicious packages. Empirical evaluation demonstrates significantly improved detection rates for multi-stage and dynamically loaded attacks, enabling robust model training and standardized benchmarking for PyPI supply-chain security.
📝 Abstract
Securing software supply chains is a growing challenge due to the inadequacy of existing datasets in capturing the complexity of next-gen attacks, such as multiphase malware execution, remote access activation, and dynamic payload generation. Existing datasets, which rely on metadata inspection and static code analysis, are inadequate for detecting such attacks. This creates a critical gap because these datasets do not capture what happens during and after a package is installed. To address this gap, we present QUT-DV25, a dynamic analysis dataset specifically designed to support and advance research on detecting and mitigating supply chain attacks within the Python Package Index (PyPI) ecosystem. This dataset captures install and post-install-time traces from 14,271 Python packages, of which 7,127 are malicious. The packages are executed in an isolated sandbox environment using an extended Berkeley Packet Filter (eBPF) kernel and user-level probes. It captures 36 real-time features, that includes system calls, network traffic, resource usages, directory access patterns, dependency logs, and installation behaviors, enabling the study of next-gen attack vectors. ML analysis using the QUT-DV25 dataset identified four malicious PyPI packages previously labeled as benign, each with thousands of downloads. These packages deployed covert remote access and multi-phase payloads, were reported to PyPI maintainers, and subsequently removed. This highlights the practical value of QUT-DV25, as it outperforms reactive, metadata, and static datasets, offering a robust foundation for developing and benchmarking advanced threat detection within the evolving software supply chain ecosystem.