Unlocking Reproducibility: Automating re-Build Process for Open-Source Software

๐Ÿ“… 2025-09-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Binary artifacts in ecosystems like Maven Central often diverge from their source code, and opaque build environments introduce security risksโ€”including untrusted CI/CD pipelines, non-reproducible builds, and undetectable dependency tampering. To address these challenges, this paper proposes an automated source-code reconstruction framework built upon an extended Macaron architecture. It integrates static analysis, GitHub Actions log parsing, and build-environment inference to automatically extract critical configuration parameters (e.g., JDK version, build commands). It introduces, for the first time in the Java context, a root-cause diagnosis mechanism for build failures and an extensible rebuild engine. Experimental evaluation demonstrates significant improvements in artifact reproducibility and verifiability across large-scale dependency graphs. The framework enables source-level software supply chain auditing and strengthens defenses against malicious builds and supply-chain contamination.

Technology Category

Application Category

๐Ÿ“ Abstract
Software ecosystems like Maven Central play a crucial role in modern software supply chains by providing repositories for libraries and build plugins. However, the separation between binaries and their corresponding source code in Maven Central presents a significant challenge, particularly when it comes to linking binaries back to their original build environment. This lack of transparency poses security risks, as approximately 84% of the top 1200 commonly used artifacts are not built using a transparent CI/CD pipeline. Consequently, users must place a significant amount of trust not only in the source code but also in the environment in which these artifacts are built. Rebuilding software artifacts from source provides a robust solution to improve supply chain security. This approach allows for a deeper review of code, verification of binary-source equivalence, and control over dependencies. However, challenges arise due to variations in build environments, such as JDK versions and build commands, which can lead to build failures. Additionally, ensuring that all dependencies are rebuilt from source across large and complex dependency graphs further complicates the process. In this paper, we introduce an extension to Macaron, an industry-grade open-source supply chain security framework, to automate the rebuilding of Maven artifacts from source. Our approach improves upon existing tools, by offering better performance in source code detection and automating the extraction of build specifications from GitHub Actions workflows. We also present a comprehensive root cause analysis of build failures in Java projects and propose a scalable solution to automate the rebuilding of artifacts, ultimately enhancing security and transparency in the open-source supply chain.
Problem

Research questions and friction points this paper is trying to address.

Automating rebuild process for Maven artifacts from source
Addressing binary-source separation in software supply chains
Enhancing security through transparent CI/CD pipeline verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automates rebuilding Maven artifacts from source
Extracts build specifications from GitHub Actions
Performs root cause analysis for build failures
๐Ÿ”Ž Similar Papers
No similar papers found.
B
Behnaz Hassanshahi
Oracle Labs, Brisbane, Australia
T
Trong Nhan Mai
Oracle Labs, Brisbane, Australia
B
Benjamin Selwyn Smith
Oracle Labs, Brisbane, Australia
Nicholas Allen
Nicholas Allen
Ann Swindells Professor of Psychology, University of Oregon
AdolescenceDigital Mental HealthDepressionSleepDevelopmental Psychopathology