HPC Containers for EBRAINS: Towards Portable Cross-Domain Software Environment

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of deploying scientific workflows across heterogeneous HPC sites, where divergent software environments often compromise both portability and performance. The authors propose a containerization approach based on Apptainer, integrated with a PMIx hybrid runtime strategy that dynamically leverages host GPU and networking hardware without requiring recompilation, thereby enabling reuse of native drivers and software stacks. The solution is embedded within the Spack ecosystem via the EBRAINS Software Distribution and incorporates low-level log analysis to proactively detect communication anomalies. Experimental results on the Karolina and Jureca-DC clusters demonstrate that the resulting MPI+CUDA container images achieve performance on par with bare-metal deployments across OSU and NCCL microbenchmarks as well as neuroscience applications, confirming the feasibility of high-performance, reproducible execution across diverse platforms.

Technology Category

Application Category

📝 Abstract
Deploying complex, distributed scientific workflows across diverse HPC sites is often hindered by site-specific dependencies and complex build environments. This paper investigates the design and performance of portable HPC container images capable of encapsulating MPI- and CUDA-enabled software stacks without sacrificing bare-metal performance. This work is part of recent work performed within the EBRAINS Research Infrastructure, to evaluate the implementation of portable HPC (Apptainer-based) container images targeting the EBRAINS Software Distribution (ESD) -- a Spack-based software ecosystem comprising approximately 80 top-level packages (and 800 dependencies). We evaluate a hybrid, PMIx-based containerization strategy using Apptainer that seamlessly bypasses the need for site-specific builds by dynamically leveraging host-level specialized hardware, such as network interfaces and GPUs, on two production HPC clusters: Karolina and Jureca-DC. We demonstrate the feasibility of building portable, MPI- and CUDA-enabled scientific software into container images that correctly leverage site-installed drivers and hardware to reproduce bare-metal communication behavior. Using communication microbenchmarks (e.g., OSU and NCCL) alongside performance metrics of applications from neuroscience, we measure and verify their performance against bare-metal deployments. Crucially, our verification approach extends beyond top-level runtime measurements; we highlight the analysis of underlying debug logs to actively detect misbehavior and misconfigurations, such as suboptimal transport pathways. Ultimately, this investigation demonstrates the feasibility of a simple and reproducible methodology for decoupling software environments from underlying infrastructures, paving the way for automated pipelines that ensure optimized, performance-verified execution across varied HPC architectures.
Problem

Research questions and friction points this paper is trying to address.

HPC
portability
scientific workflows
containerization
cross-domain
Innovation

Methods, ideas, or system contributions that make the work stand out.

HPC containers
Apptainer
PMIx
portable software environment
performance verification
🔎 Similar Papers
No similar papers found.