Addressing Reproducibility Challenges in HPC with Continuous Integration

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

HPC research suffers from persistent reproducibility challenges due to resource exclusivity, restricted access, and heterogeneous environments. To address this, we propose CORRECT—the first GitHub Actions–based continuous integration framework specifically designed for HPC, enabling secure, automated reproducibility validation on remote supercomputing resources. CORRECT innovatively integrates fine-grained execution provenance tracking, lightweight containerized deployment, and strict permission isolation, thereby overcoming the fundamental incompatibility of conventional CI tools with HPC workload managers (e.g., Slurm). Empirical evaluation across three representative HPC application categories—scientific simulation, AI training, and performance benchmarking—demonstrates that CORRECT significantly improves automation, transparency, and documentation completeness in reproducibility assessment. It provides a scalable, production-ready engineering solution to advance reproducible research in HPC.

Technology Category

Application Category

📝 Abstract

The high-performance computing (HPC) community has adopted incentive structures to motivate reproducible research, with major conferences awarding badges to papers that meet reproducibility requirements. Yet, many papers do not meet such requirements. The uniqueness of HPC infrastructure and software, coupled with strict access requirements, may limit opportunities for reproducibility. In the absence of resource access, we believe that regular documented testing, through continuous integration (CI), coupled with complete provenance information, can be used as a substitute. Here, we argue that better HPC-compliant CI solutions will improve reproducibility of applications. We present a survey of reproducibility initiatives and describe the barriers to reproducibility in HPC. To address existing limitations, we present a GitHub Action, CORRECT, that enables secure execution of tests on remote HPC resources. We evaluate CORRECT's usability across three different types of HPC applications, demonstrating the effectiveness of using CORRECT for automating and documenting reproducibility evaluations.

Problem

Research questions and friction points this paper is trying to address.

Addressing reproducibility challenges in high-performance computing

Overcoming HPC infrastructure and software uniqueness barriers

Providing secure continuous integration solutions for HPC

Innovation

Methods, ideas, or system contributions that make the work stand out.

GitHub Action for secure HPC testing

Continuous integration with provenance tracking

Remote HPC resource execution automation

🔎 Similar Papers

No similar papers found.