Computational Reproducibility of R Code Supplements on OSF

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This study addresses the widespread lack of computational reproducibility in R supplementary code deposited on the Open Science Framework (OSF). A systematic audit of 296 published R code packages revealed that 98.8% incompletely declare dependencies. To address this, we propose the first automated reproducibility auditing framework tailored to the R ecosystem. It combines static source-code analysis—leveraging regular expressions and abstract syntax trees (ASTs)—to accurately infer dependencies, with Docker-based containerized execution and failure diagnostics (e.g., path errors, OS-specific inconsistencies, missing packages) to enable end-to-end environment reconstruction and validation. Experiments successfully executed 25.87% of scripts, identifying undeclared dependencies, hardcoded file paths, and cross-platform compatibility issues as the three primary barriers to reproducibility. The framework enables large-scale, low-cost, and scalable quantitative assessment of computational reproducibility in scholarly research, providing a practical toolchain to enhance transparency and verifiability.

Technology Category

Application Category

📝 Abstract

Computational reproducibility is fundamental to scientific research, yet many published code supplements lack the necessary documentation to recreate their computational environments. While researchers increasingly share code alongside publications, the actual reproducibility of these materials remains poorly understood. In this work, we assess the computational reproducibility of 296 R projects using the StatCodeSearch dataset. Of these, only 264 were still retrievable, and 98.8% lacked formal dependency descriptions required for successful execution. To address this, we developed an automated pipeline that reconstructs computational environments directly from project source code. Applying this pipeline, we executed the R scripts within custom Docker containers and found that 25.87% completed successfully without error. We conducted a detailed analysis of execution failures, identifying reproducibility barriers such as undeclared dependencies, invalid file paths, and system-level issues. Our findings show that automated dependency inference and containerisation can support scalable verification of computational reproducibility and help identify practical obstacles to code reuse and transparency in scientific research.

Problem

Research questions and friction points this paper is trying to address.

Assessing computational reproducibility of R projects

Identifying barriers like undeclared dependencies and file paths

Developing automated pipeline for environment reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline reconstructs computational environments

Uses Docker containers for script execution

Identifies reproducibility barriers via dependency inference

🔎 Similar Papers

Reproducibility in Machine Learning-based Research: Overview, Barriers and Drivers