Fine-Grained Power and Energy Attribution on AMD GPU/APU-Based Exascale Nodes

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of accurately attributing short-duration accelerator energy consumption in AMD GPU/APU-based supercomputing nodes, where heterogeneous power sensors exhibit divergent sampling characteristics. The authors propose a portable sensor calibration and fusion methodology that employs square-wave workloads to characterize sensor responses, reconstructs power signals from cumulative energy counters, and fuses on-die, off-die, and node-level sensor data to achieve temporally aligned, phase-level energy attribution. This approach enables, for the first time, fine-grained energy analysis on the Frontier and Portage systems, effectively disentangling energy savings due to reduced runtime from those arising from lower power draw. Experiments demonstrate that mixed-precision execution reduces node-level energy consumption by 79% for rocHPL-MxP and 31% for HPG-MxP on Frontier, with consistent results reproduced on Portage.
📝 Abstract
Modern exascale GPU- and APU-based systems provide multiple power and energy sensors, but differences in scope, update rate, timing, and filtering complicate the attribution of short-lived accelerator activity. This paper presents a methodology to characterize and correct these effects on Cray EX systems with AMD Instinct MI250X GPUs (Frontier) and MI300A APUs (Portage). Using controlled square-wave workloads, we quantify update intervals, delay, aliasing, and variability across up to 512 GPUs and 480 APUs with on-chip (rocm-smi/amd-smi) and off-chip Cray Power Management sensors. We reconstruct power from cumulative energy counters to achieve faster response times, validate it against on-chip, off-chip, and node-level sensors, and integrate the resulting streams into a Score-P/PAPI-based tool for time-aligned, phase-level attribution. Applied to rocHPL, rocHPL-MxP, and HPG-MxP, the method separates energy savings due to reduced runtime from changes in power. Mixed precision reduces node energy on Frontier by 79% for rocHPL-MxP and 31% for HPG-MxP, with similar trends on Portage. These results provide portable guidance for sensor validation and power-aware optimization on current and future exascale systems.
Problem

Research questions and friction points this paper is trying to address.

power attribution
energy measurement
GPU/APU systems
exascale computing
sensor discrepancy
Innovation

Methods, ideas, or system contributions that make the work stand out.

fine-grained energy attribution
AMD GPU/APU
power sensor characterization
mixed-precision energy efficiency
exascale power monitoring
🔎 Similar Papers
No similar papers found.
A
Adam McDaniel
Dept. of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA
Michael Jantz
Michael Jantz
University of Tennessee
Runtime SystemsCompilersOperating Systems
A
Ashesh Sharma
Hewlett Packard Enterprise (HPE), Bloomington, MN, USA
S
Steve Abbott
Hewlett Packard Enterprise (HPE), Bloomington, MN, USA
S
Steven Martin
Hewlett Packard Enterprise (HPE), Bloomington, MN, USA
S
Shreyas Khandekar
Hewlett Packard Enterprise (HPE), Bloomington, MN, USA
B
Brandon Neth
Hewlett Packard Enterprise (HPE), Bloomington, MN, USA
B
Bruno Villasenor Alvarez
Advanced Micro Devices, Inc., Santa Clara, CA, USA
Aditya Kashi
Aditya Kashi
Oak Ridge National Laboratory
Scalable numerical methodshigh-performance computing
Wael Elwasif
Wael Elwasif
Oak Ridge National Laboratory
Oscar Hernandez
Oscar Hernandez
Oak Ridge National Laboratory
High Performance ComputingCompilersCode Transformation ToolsPerformance Optimizations