SoK: Towards Reproducibility for Software Packages in Scripting Language Ecosystems

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses security risks—such as build tampering and source-artifact divergence—arising from the lack of reproducibility in script-language package ecosystems (e.g., Python, JavaScript). It presents the first cross-ecosystem Systematization of Knowledge (SoK) on reproducibility, systematically modeling and comparing build mechanisms across script languages and compiled-language distributions (e.g., C/C++ in Linux distros) to identify both common challenges and ecosystem-specific barriers. We find existing efforts highly fragmented, focusing narrowly on isolated languages or single dimensions (e.g., dependency resolution or build environment control). To address this, we propose a unified challenge taxonomy, map critical knowledge gaps, and distill high-priority, cross-ecosystem mitigation strategies. Our SoK fills a foundational gap in reproducibility research for script languages and provides both theoretical grounding and a practical roadmap for securing script-language software supply chains. (149 words)

Technology Category

Application Category

📝 Abstract

The disconnect between distributed software artifacts and their supposed source code enables attackers to leverage the build process for inserting malicious functionality. Past research in this field focuses on compiled language ecosystems, mostly analysing Linux distribution packages. However, the popular scripting language ecosystems potentially face unique issues given the systematic difference in distributed artifacts. This SoK provides an overview of existing research, aiming to highlight future directions, as well as chances to transfer existing knowledge from compiled language ecosystems. To that end, we work out key aspects in current research, systematize identified challenges for software reproducibility, and map them between the ecosystems. We find that the literature is sparse, focusing on few individual problems and ecosystems. This allows us to effectively identify next steps to improve reproducibility in this field.

Problem

Research questions and friction points this paper is trying to address.

Addressing malicious functionality in scripting language ecosystems

Systematizing challenges for software reproducibility across ecosystems

Identifying research gaps to improve reproducibility in scripting languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes scripting language ecosystems reproducibility issues

Systematizes challenges for software reproducibility across ecosystems

Identifies next steps to enhance reproducibility research

🔎 Similar Papers

PackageIntel: Leveraging Large Language Models for Automated Intelligence Extraction in Package Ecosystems