Scaling MPI Applications on Aurora

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
MPI applications on the Aurora supercomputer—a Dragonfly-topology Slingshot network with tens of thousands of nodes—suffer from communication bottlenecks that limit scalability. Method: This work conducts system-level co-optimization across the full hardware stack: tuning 85,000 Cassini NICs and 5,600 Rosetta switches, integrated with Intel Xeon Max CPUs, Data Center Max GPUs, and HPE Slingshot interconnects. Evaluation spans standard benchmarks (HPL, HPL-MxP, HPCG, Graph500) and production scientific applications (HACC, AMR-Wind). Contribution/Results: Achieves #2 on the TOP500 and #1 on HPL-MxP globally. Demonstrates substantial latency reduction and significant improvements in bandwidth and throughput. The optimization enables scalable AI-HPC convergence for large-scale scientific simulation and establishes a novel paradigm for MPI scalability on heterogeneous exascale systems.

Technology Category

Application Category

📝 Abstract
The Aurora supercomputer, which was deployed at Argonne National Laboratory in 2024, is currently one of three Exascale machines in the world on the Top500 list. The Aurora system is composed of over ten thousand nodes each of which contains six Intel Data Center Max Series GPUs, Intel's first data center-focused discrete GPU, and two Intel Xeon Max Series CPUs, Intel's first Xeon processor to contain HBM memory. To achieve Exascale performance the system utilizes the HPE Slingshot high-performance fabric interconnect to connect the nodes. Aurora is currently the largest deployment of the Slingshot fabric to date with nearly 85,000 Cassini NICs and 5,600 Rosetta switches connected in a dragonfly topology. The combination of the Intel powered nodes and the Slingshot network enabled Aurora to become the second fastest system on the Top500 list in June of 2024 and the fastest system on the HPL MxP benchmark. The system is one of the most powerful systems in the world dedicated to AI and HPC simulations for open science. This paper presents details of the Aurora system design with a particular focus on the network fabric and the approach taken to validating it. The performance of the systems is demonstrated through the presentation of the results of MPI benchmarks as well as performance benchmarks including HPL, HPL-MxP, Graph500, and HPCG run on a large fraction of the system. Additionally results are presented for a diverse set of applications including HACC, AMR-Wind, LAMMPS, and FMM demonstrating that Aurora provides the throughput, latency, and bandwidth across system needed to allow applications to perform and scale to large node counts and providing new levels of capability and enabling breakthrough science.
Problem

Research questions and friction points this paper is trying to address.

Scaling MPI applications on the Aurora exascale supercomputer
Validating the Slingshot network fabric performance and design
Demonstrating application throughput, latency, and bandwidth at scale
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Intel Max Series GPUs and CPUs for high performance
Employs HPE Slingshot fabric in dragonfly topology for connectivity
Validates system with MPI and diverse application benchmarks
🔎 Similar Papers
No similar papers found.
H
Huda Ibeid
Intel Corporation, Santa Clara, California, USA
A
Anthony-Trung Nguyen
Intel Corporation, Santa Clara, California, USA
A
Aditya Nishtala
Intel Corporation, Santa Clara, California, USA
P
Premanand Sakarda
Intel Corporation, Santa Clara, California, USA
Larry Kaplan
Larry Kaplan
Hewlett Packard Enterprise, Shoreline, WA, USA
N
Nilakantan Mahadevan
Hewlett Packard Enterprise, Springs, Texas, USA
M
Michael Woodacre
Hewlett Packard Enterprise, Springs, Texas, USA
V
Victor Anisimov
Argonne National Laboratory, Lemont, Illinois, USA
Kalyan Kumaran
Kalyan Kumaran
Argonne National Laboratory
Computer Science
J
JaeHyuk Kwack
Argonne National Laboratory, Lemont, Illinois, USA
V
Vitali Morozov
Argonne National Laboratory, Lemont, Illinois, USA
Servesh Muralidharan
Servesh Muralidharan
Argonne Leadership Computing Facility
High Performance ComputingParallel ProgrammingEnergy Efficiency
Scott Parker
Scott Parker
Argonne National Laboratory, Lemont, Illinois, USA