Characterizing GPU Energy Usage in Exascale-Ready Portable Science Applications

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Energy efficiency characterization of exascale scientific applications remains challenging due to hardware heterogeneity and insufficient cross-platform benchmarking. Method: This study systematically quantifies the energy consumption characteristics of QMCPACK (a particle-based solver) and AMReX-Castro (a grid-based solver) on NVIDIA A100/H100 and AMD MI250X GPUs, using millisecond-resolution hardware telemetry (via NVML/rocm-smi), mixed-precision (FP64/FP32) benchmarks, and application-specific energy-efficiency metrics. Contribution/Results: It presents the first cross-vendor, application-level energy-efficiency comparison for exascale workloads. Key findings include up to 45% GPU energy reduction with mixed precision for AMReX-Castro and 6–25% for QMCPACK; identification of monitoring gaps in AMD’s toolchain on the Frontier system; and empirical validation that high-temporal-resolution power sampling (1 ms–1 s) is critical for accurate energy modeling. These results provide empirically grounded trade-off insights and optimization pathways for co-design of hardware and software in the post-Moore era.

Technology Category

Application Category

📝 Abstract
We characterize the GPU energy usage of two widely adopted exascale-ready applications representing two classes of particle and mesh solvers: (i) QMCPACK, a quantum Monte Carlo package, and (ii) AMReX-Castro, an adaptive mesh astrophysical code. We analyze power, temperature, utilization, and energy traces from double-/single (mixed)-precision benchmarks on NVIDIA's A100 and H100 and AMD's MI250X GPUs using queries in NVML and rocm smi lib, respectively. We explore application-specific metrics to provide insights on energy vs. performance trade-offs. Our results suggest that mixed-precision energy savings range between 6-25% on QMCPACK and 45% on AMReX-Castro. Also there are still gaps in the AMD tooling on Frontier GPUs that need to be understood, while query resolutions on NVML have little variability between 1 ms and 1 s. Overall, application level knowledge is crucial to define energy-cost/science-benefit opportunities for the codesign of future supercomputer architectures in the post-Moore era.
Problem

Research questions and friction points this paper is trying to address.

Analyzing GPU energy usage in exascale-ready particle and mesh solvers
Exploring energy-performance trade-offs in mixed-precision benchmarks on NVIDIA and AMD GPUs
Identifying gaps in AMD tooling and variability in NVML query resolutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyze GPU energy usage with NVML and rocm smi lib
Explore energy-performance trade-offs via application-specific metrics
Demonstrate mixed-precision energy savings up to 45%
🔎 Similar Papers
No similar papers found.
W
William F. Godoy
Oak Ridge National Laboratory, Oak Ridge, TN, USA
O
Oscar Hernandez
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Paul R. C. Kent
Paul R. C. Kent
Oak Ridge National Laboratory
Computational Materials ScienceEnergy StorageNanotechnologySupercomputingQuantum Monte Carlo
M
Maria Patrou
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Kazi Asifuzzaman
Kazi Asifuzzaman
Research Scientist, Oak Ridge National Laboratory
HPCAINeuromorphic ComputingComputer ArchitectureQuantum Computing
N
N. Miniskar
Oak Ridge National Laboratory, Oak Ridge, TN, USA
P
Pedro Valero-Lara
Oak Ridge National Laboratory, Oak Ridge, TN, USA
Jeffrey S. Vetter
Jeffrey S. Vetter
Oak Ridge National Laboratory
high performance computing
M
Matthew D. Sinclair
University of Wisconsin-Madison, Madison, WI, USA
Jason Lowe-Power
Jason Lowe-Power
Associate Professor, University of California, Davis
Computer architecture
Bobby R. Bruce
Bobby R. Bruce
University of California, Davis
Software EngineeringSimulation Softwaregem5SBSE