Multi-GPU Hybrid Particle-in-Cell Monte Carlo Simulations for Exascale Computing Systems

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This work addresses the challenges of plasma particle-in-cell simulations on heterogeneous supercomputing systems, including frequent data movement, high synchronization overhead, and underutilized multi-GPU resources. The authors propose a hybrid MPI+OpenMP parallelization strategy that leverages OpenMP tasking with explicit dependency management to overlap computation and communication on both NVIDIA and AMD GPUs. By integrating persistent device memory, a one-dimensional contiguous data layout, pinned host memory, and GPU-direct DMA transfers, the approach significantly enhances data transfer efficiency and device memory access. Furthermore, seamless integration with openPMD and ADIOS2 enables high-performance I/O. Evaluated on pre-exascale systems such as Frontier, the implementation scales to 16,000 GPUs, substantially reducing runtime while markedly improving portability, scalability, and hardware utilization.

Technology Category

Application Category

📝 Abstract

Particle-in-Cell (PIC) Monte Carlo (MC) simulations are central to plasma physics but face increasing challenges on heterogeneous HPC systems due to excessive data movement, synchronization overheads, and inefficient utilization of multiple accelerators. In this work, we present a portable, multi-GPU hybrid MPI+OpenMP implementation of BIT1 that enables scalable execution on both Nvidia and AMD accelerators through OpenMP target tasks with explicit dependencies to overlap computation and communication across devices. Portability is achieved through persistent device-resident memory, an optimized contiguous one-dimensional data layout, and a transition from unified to pinned host memory to improve large data-transfer efficiency, together with GPU Direct Memory Access (DMA) and runtime interoperability for direct device-pointer access. Standardized and scalable I/O is provided using openPMD and ADIOS2, supporting high-performance file I/O, in-memory data streaming, and in-situ analysis and visualization. Performance results on pre-exascale and exascale systems, including Frontier (OLCF-5) for up to 16,000 GPUs, demonstrate significant improvements in run time, scalability, and resource utilization for large-scale PIC MC simulations.

Problem

Research questions and friction points this paper is trying to address.

Particle-in-Cell

Monte Carlo

multi-GPU

exascale computing

heterogeneous HPC

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-GPU

Hybrid MPI+OpenMP

OpenMP target tasks