Wattchmen: Watching the Wattchers -- High Fidelity, Flexible GPU Energy Modeling

πŸ“… 2026-03-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing GPU power estimation methods often suffer from low accuracy, limited flexibility, or outdated architectural assumptions, making them inadequate for fine-grained energy analysis in modern high-performance computing. To address this gap, this work proposes Wattchmenβ€”a high-fidelity, cross-architecture, instruction-level GPU power modeling framework. By constructing instruction energy models calibrated with diverse microbenchmarks, Wattchmen enables accurate power prediction and attribution across architectures such as V100, A100, and H100 under varying cooling conditions. Experimental evaluation on 16 representative workloads demonstrates that Wattchmen achieves an average absolute percentage error as low as 14% on the V100, substantially outperforming AccelWattch and Guser. Furthermore, it successfully guided energy optimizations for Backprop and QMCPACK, yielding up to 35% energy savings.
πŸ“ Abstract
Modern GPU-rich HPC systems are increasingly becoming energy-constrained. Thus, understanding an application's energy consumption becomes essential. Unfortunately, current GPU energy attribution techniques are either inaccurate, inflexible, or outdated. Therefore, we propose Wattchmen, a flexible methodology for measuring, attributing, and predicting GPU energy consumption. We construct a per-instruction energy model using a diverse set of microbenchmarks to systematically quantify the energy consumption of GPU instructions, enabling finer-grain prediction and energy consumption breakdowns for applications. Compared with the state-of-the-art systems like AccelWattch (32%) and Guser (25%), across 16 popular GPGPU, graph analytics, HPC, and ML workloads, Wattchmen reduces the mean absolute percent error (MAPE) to 14% on V100 GPUs. Furthermore, we show that Wattchmen provides similar MAPEs for water-cooled V100s (15%) and extends to later architectures, including air-cooled A100 (11%) and H100 (12%) GPUs. Finally, to further demonstrate Wattchmen's value, we apply it to applications such as Backprop and QMCPACK, where Wattchmen's insights enable energy reductions of up to 35%.
Problem

Research questions and friction points this paper is trying to address.

GPU energy modeling
energy attribution
energy consumption prediction
high-performance computing
power measurement
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU energy modeling
instruction-level energy attribution
microbenchmark-based characterization
cross-architecture energy prediction
energy-efficient HPC
πŸ”Ž Similar Papers
No similar papers found.