Wattchmen: Watching the Wattchers -- High Fidelity, Flexible GPU Energy Modeling

📅 2026-03-27

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

Existing GPU power estimation methods often suffer from low accuracy, limited flexibility, or outdated architectural assumptions, making them inadequate for fine-grained energy analysis in modern high-performance computing. To address this gap, this work proposes Wattchmen—a high-fidelity, cross-architecture, instruction-level GPU power modeling framework. By constructing instruction energy models calibrated with diverse microbenchmarks, Wattchmen enables accurate power prediction and attribution across architectures such as V100, A100, and H100 under varying cooling conditions. Experimental evaluation on 16 representative workloads demonstrates that Wattchmen achieves an average absolute percentage error as low as 14% on the V100, substantially outperforming AccelWattch and Guser. Furthermore, it successfully guided energy optimizations for Backprop and QMCPACK, yielding up to 35% energy savings.

Technology Category

Application Category

📝 Abstract

Modern GPU-rich HPC systems are increasingly becoming energy-constrained. Thus, understanding an application's energy consumption becomes essential. Unfortunately, current GPU energy attribution techniques are either inaccurate, inflexible, or outdated. Therefore, we propose Wattchmen, a flexible methodology for measuring, attributing, and predicting GPU energy consumption. We construct a per-instruction energy model using a diverse set of microbenchmarks to systematically quantify the energy consumption of GPU instructions, enabling finer-grain prediction and energy consumption breakdowns for applications. Compared with the state-of-the-art systems like AccelWattch (32%) and Guser (25%), across 16 popular GPGPU, graph analytics, HPC, and ML workloads, Wattchmen reduces the mean absolute percent error (MAPE) to 14% on V100 GPUs. Furthermore, we show that Wattchmen provides similar MAPEs for water-cooled V100s (15%) and extends to later architectures, including air-cooled A100 (11%) and H100 (12%) GPUs. Finally, to further demonstrate Wattchmen's value, we apply it to applications such as Backprop and QMCPACK, where Wattchmen's insights enable energy reductions of up to 35%.

Problem

Research questions and friction points this paper is trying to address.

GPU energy modeling

energy attribution

energy consumption prediction

high-performance computing

power measurement

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU energy modeling

instruction-level energy attribution

microbenchmark-based characterization