Performance Analysis of HPC applications on the Aurora Supercomputer: Exploring the Impact of HBM-Enabled Intel Xeon Max CPUs

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
This study addresses memory subsystem optimization for the Aurora exascale supercomputer, specifically quantifying the impact of Xeon Max CPUs’ integrated high-bandwidth memory (HBM) on HPC performance and resolving trade-offs among HBM vs. DDR memory selection, Flat vs. Cache memory modes, and Quad vs. SNC4 cluster configurations. Method: We conduct a comprehensive evaluation using microbenchmarks (STREAM, b_eff), MPI/CPU–GPU bandwidth measurements, multi-mode empirical testing, and representative HPC applications (HACC, QMCPACK, BFS). Contribution/Results: To our knowledge, this is the first characterization of HBM effects on a real exascale system: HBM delivers up to 2.3× speedup for bandwidth-bound workloads, whereas Cache mode benefits latency-sensitive applications. Based on these findings, we propose the first memory configuration decision framework driven by application memory access characteristics. The framework has been formally adopted into the Aurora User Guide and provides a transferable methodology for memory subsystem optimization in exascale systems.

Technology Category

Application Category

📝 Abstract
The Aurora supercomputer is an exascale-class system designed to tackle some of the most demanding computational workloads. Equipped with both High Bandwidth Memory (HBM) and DDR memory, it provides unique trade-offs in performance, latency, and capacity. This paper presents a comprehensive analysis of the memory systems on the Aurora supercomputer, with a focus on evaluating the trade-offs between HBM and DDR memory systems. We explore how different memory configurations, including memory modes (Flat and Cache) and clustering modes (Quad and SNC4), influence key system performance metrics such as memory bandwidth, latency, CPU-GPU PCIe bandwidth, and MPI communication bandwidth. Additionally, we examine the performance of three representative HPC applications -- HACC, QMCPACK, and BFS -- each illustrating the impact of memory configurations on performance. By using microbenchmarks and application-level analysis, we provide insights into how to select the optimal memory system and configuration to maximize performance based on the application characteristics. The findings presented in this paper offer guidance for users of the Aurora system and similar exascale systems.
Problem

Research questions and friction points this paper is trying to address.

Analyze HBM vs DDR memory trade-offs on Aurora
Evaluate memory modes' impact on HPC performance
Optimize memory configs for exascale applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes HBM and DDR memory trade-offs
Evaluates Flat and Cache memory modes
Tests HPC apps for optimal configurations
🔎 Similar Papers
No similar papers found.