Multi-GPU Hybrid Particle-in-Cell Monte Carlo Simulations for Exascale Computing Systems

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of plasma particle-in-cell simulations on heterogeneous supercomputing systems, including frequent data movement, high synchronization overhead, and underutilized multi-GPU resources. The authors propose a hybrid MPI+OpenMP parallelization strategy that leverages OpenMP tasking with explicit dependency management to overlap computation and communication on both NVIDIA and AMD GPUs. By integrating persistent device memory, a one-dimensional contiguous data layout, pinned host memory, and GPU-direct DMA transfers, the approach significantly enhances data transfer efficiency and device memory access. Furthermore, seamless integration with openPMD and ADIOS2 enables high-performance I/O. Evaluated on pre-exascale systems such as Frontier, the implementation scales to 16,000 GPUs, substantially reducing runtime while markedly improving portability, scalability, and hardware utilization.

Technology Category

Application Category

📝 Abstract
Particle-in-Cell (PIC) Monte Carlo (MC) simulations are central to plasma physics but face increasing challenges on heterogeneous HPC systems due to excessive data movement, synchronization overheads, and inefficient utilization of multiple accelerators. In this work, we present a portable, multi-GPU hybrid MPI+OpenMP implementation of BIT1 that enables scalable execution on both Nvidia and AMD accelerators through OpenMP target tasks with explicit dependencies to overlap computation and communication across devices. Portability is achieved through persistent device-resident memory, an optimized contiguous one-dimensional data layout, and a transition from unified to pinned host memory to improve large data-transfer efficiency, together with GPU Direct Memory Access (DMA) and runtime interoperability for direct device-pointer access. Standardized and scalable I/O is provided using openPMD and ADIOS2, supporting high-performance file I/O, in-memory data streaming, and in-situ analysis and visualization. Performance results on pre-exascale and exascale systems, including Frontier (OLCF-5) for up to 16,000 GPUs, demonstrate significant improvements in run time, scalability, and resource utilization for large-scale PIC MC simulations.
Problem

Research questions and friction points this paper is trying to address.

Particle-in-Cell
Monte Carlo
multi-GPU
exascale computing
heterogeneous HPC
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-GPU
Hybrid MPI+OpenMP
OpenMP target tasks
GPU Direct Memory Access
openPMD/ADIOS2
🔎 Similar Papers
No similar papers found.
Jeremy J. Williams
Jeremy J. Williams
KTH Royal Institute of Technology
Performance EngineeringData CentersHPCHPDASupercomputing
J
Jordy Trilaksono
Max Planck Institute for Plasma Physics, Garching, Germany
S
Stefan Costea
Faculty of Mechanical Engineering, University of Ljubljana, Ljubljana, Slovenia
Yi Ju
Yi Ju
Systems Engineering, UC Berkeley
electric vehiclessustainable infrastructuressmart gridbuilt environment
Luca Pennati
Luca Pennati
KTH - Royal Institute of Technology
High-Performance ComputingPlasma Physics
Jonah Ekelund
Jonah Ekelund
KTH Royal Institute of Technology
Computational ScienceSpace PhysicsOrbital Dynamics
David Tskhakaya
David Tskhakaya
Unknown affiliation
Leon Kos
Leon Kos
University of Ljubljana
plasmafusionvisualisation HPCparallell computing
Ales Podolnik
Ales Podolnik
Researcher, Institute of Plasma Phycics, The Czech Academy of Sciences
plasma physicssimulationsLangmuir probesplasma-wall interactionhigh performance computing
J
Jakub Hromadka
Institute of Plasma Physics of the CAS, Prague, Czech Republic
Allen D. Malony
Allen D. Malony
University of Oregon
parallel computingperformance analysis
Sameer Shende
Sameer Shende
Research Professor and Director, Performance Research Laboratory, University of Oregon and
Performance Evaluation ToolsInstrumentationMeasurementRuntime Systems
T
Tilman Dannert
Max Planck Computing and Data Facility, Garching, Germany
Frank Jenko
Frank Jenko
Max Planck Institute for Plasma Physics
Erwin Laure
Erwin Laure
Unknown affiliation
Stefano Markidis
Stefano Markidis
Professor, KTH Royal Institute of Technology
High Performance ComputingComputational Plasma PhysicsQuantum Computing