Characterizing GPU Energy Usage in Exascale-Ready Portable Science Applications

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Energy efficiency characterization of exascale scientific applications remains challenging due to hardware heterogeneity and insufficient cross-platform benchmarking. Method: This study systematically quantifies the energy consumption characteristics of QMCPACK (a particle-based solver) and AMReX-Castro (a grid-based solver) on NVIDIA A100/H100 and AMD MI250X GPUs, using millisecond-resolution hardware telemetry (via NVML/rocm-smi), mixed-precision (FP64/FP32) benchmarks, and application-specific energy-efficiency metrics. Contribution/Results: It presents the first cross-vendor, application-level energy-efficiency comparison for exascale workloads. Key findings include up to 45% GPU energy reduction with mixed precision for AMReX-Castro and 6–25% for QMCPACK; identification of monitoring gaps in AMD’s toolchain on the Frontier system; and empirical validation that high-temporal-resolution power sampling (1 ms–1 s) is critical for accurate energy modeling. These results provide empirically grounded trade-off insights and optimization pathways for co-design of hardware and software in the post-Moore era.

Technology Category

Application Category

📝 Abstract

We characterize the GPU energy usage of two widely adopted exascale-ready applications representing two classes of particle and mesh solvers: (i) QMCPACK, a quantum Monte Carlo package, and (ii) AMReX-Castro, an adaptive mesh astrophysical code. We analyze power, temperature, utilization, and energy traces from double-/single (mixed)-precision benchmarks on NVIDIA's A100 and H100 and AMD's MI250X GPUs using queries in NVML and rocm smi lib, respectively. We explore application-specific metrics to provide insights on energy vs. performance trade-offs. Our results suggest that mixed-precision energy savings range between 6-25% on QMCPACK and 45% on AMReX-Castro. Also there are still gaps in the AMD tooling on Frontier GPUs that need to be understood, while query resolutions on NVML have little variability between 1 ms and 1 s. Overall, application level knowledge is crucial to define energy-cost/science-benefit opportunities for the codesign of future supercomputer architectures in the post-Moore era.

Problem

Research questions and friction points this paper is trying to address.

Analyzing GPU energy usage in exascale-ready particle and mesh solvers

Exploring energy-performance trade-offs in mixed-precision benchmarks on NVIDIA and AMD GPUs

Identifying gaps in AMD tooling and variability in NVML query resolutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyze GPU energy usage with NVML and rocm smi lib

Explore energy-performance trade-offs via application-specific metrics

Demonstrate mixed-precision energy savings up to 45%

🔎 Similar Papers

No similar papers found.