On the performance of two-sided MPI, MPI-3 RMA and SHMEM in a Lagrangian particle cluster algorithm

📅 2024-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the parallel efficiency bottleneck of Lagrangian particle clustering (EPIC) in N-body simulations. We systematically evaluate and compare three distributed-memory communication models—MPI point-to-point, MPI-3 Remote Memory Access (RMA), and SHMEM—in the context of 3D Euclidean nearest-neighbor search and iterative graph pruning. To our knowledge, this is the first empirical scalability and communication overhead comparison of these RMA variants under realistic, irregular graph pruning workloads. Experiments are conducted on three HPC platforms interconnected via InfiniBand FDR and HPE Slingshot. Results show that MPI-3 RMA delivers up to 1.8× speedup over conventional MPI at moderate scales, while SHMEM reduces latency by 40% in high-density local communication regimes. These improvements significantly enhance both strong and weak scaling efficiency of EPIC. The study provides empirically grounded guidance for communication model selection in irregular graph algorithms.

Technology Category

Application Category

📝 Abstract
In this paper, we compare the parallel performance of three distributed-memory communication models for a cluster algorithm based on a nearest neighbour search algorithm for N-body simulations. The nearest neighbour is defined by the Euclidean distance in three-dimensional space. The resulting directed nearest neighbour graphs that are used to define the clusters are pruned in an iterative procedure where we use either point-to-point message passing interface (MPI), MPI-3 remote memory access (RMA), or SHMEM communication. The original algorithm has been developed and implemented as part of the elliptical parcel-in-cell (EPIC) method targeting geophysical fluid flows. The parallel scalability of the algorithm is discussed by means of an artificial and a standard fluid dynamics test case. Performance measurements were carried out on three different computing systems with InfiniBand FDR, Hewlett Packard Enterprise (HPE) Slingshot 10 or HPE Slingshot 200 interconnect.
Problem

Research questions and friction points this paper is trying to address.

Compare performance of MPI, MPI-3 RMA, SHMEM in N-body simulations
Analyze nearest neighbor cluster algorithm scalability in fluid dynamics
Evaluate communication models on different high-performance computing systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares MPI, MPI-3 RMA, SHMEM communication models
Uses nearest neighbor search for N-body simulations
Tests scalability on InfiniBand, Slingshot interconnects
🔎 Similar Papers
No similar papers found.
M
Matthias Frey
Mathematical Institute, University of St Andrews, KY16 9SS, UK
S
Steven Boing
University of Leeds/Met Office Strategic Research Group, School of Earth and Environment, LS2 9JT, UK
R
Rui F. G. Ap'ostolo
EPCC, The University of Edinburgh, EH8 9BT, UK
D
Douglas Shanks