🤖 AI Summary
This work addresses security risks—such as build tampering and source-artifact divergence—arising from the lack of reproducibility in script-language package ecosystems (e.g., Python, JavaScript). It presents the first cross-ecosystem Systematization of Knowledge (SoK) on reproducibility, systematically modeling and comparing build mechanisms across script languages and compiled-language distributions (e.g., C/C++ in Linux distros) to identify both common challenges and ecosystem-specific barriers. We find existing efforts highly fragmented, focusing narrowly on isolated languages or single dimensions (e.g., dependency resolution or build environment control). To address this, we propose a unified challenge taxonomy, map critical knowledge gaps, and distill high-priority, cross-ecosystem mitigation strategies. Our SoK fills a foundational gap in reproducibility research for script languages and provides both theoretical grounding and a practical roadmap for securing script-language software supply chains. (149 words)
📝 Abstract
The disconnect between distributed software artifacts and their supposed source code enables attackers to leverage the build process for inserting malicious functionality. Past research in this field focuses on compiled language ecosystems, mostly analysing Linux distribution packages. However, the popular scripting language ecosystems potentially face unique issues given the systematic difference in distributed artifacts. This SoK provides an overview of existing research, aiming to highlight future directions, as well as chances to transfer existing knowledge from compiled language ecosystems. To that end, we work out key aspects in current research, systematize identified challenges for software reproducibility, and map them between the ecosystems. We find that the literature is sparse, focusing on few individual problems and ecosystems. This allows us to effectively identify next steps to improve reproducibility in this field.