🤖 AI Summary
Existing AI training energy estimation methods rely heavily on manufacturer-specified thermal design power (TDP), leading to substantial inaccuracies (27–37% error) due to unaccounted hardware-level power dynamics and architectural heterogeneity.
Method: This work develops an architecture-aware, computation-intensity-driven statistical power model grounded in empirical power measurements from an eight-GPU NVIDIA H100 node and open-source benchmarks. It introduces floating-point operations (FLOPs) as a calibration factor and incorporates explicit architecture classification (e.g., Transformer vs. CNN) to capture divergent dynamic power behaviors.
Contribution/Results: We empirically demonstrate that H100 training power consumption reaches only 76% of its TDP—first such quantification—and reveal pronounced power-profile disparities between Transformer- and CNN-based workloads. Our model achieves a mean absolute percentage error of 11.4%, more than doubling the accuracy of TDP-based estimation. Furthermore, it enables quantitative assessment of grid power fluctuation risks induced by Transformer workloads, providing a robust metrological foundation for green AI infrastructure planning and environmental impact evaluation.
📝 Abstract
As AI's energy demand continues to grow, it is critical to enhance the understanding of characteristics of this demand, to improve grid infrastructure planning and environmental assessment. By combining empirical measurements from Brookhaven National Laboratory during AI training on 8-GPU H100 systems with open-source benchmarking data, we develop statistical models relating computational intensity to node-level power consumption. We measure the gap between manufacturer-rated thermal design power (TDP) and actual power demand during AI training. Our analysis reveals that even computationally intensive workloads operate at only 76% of the 10.2 kW TDP rating. Our architecture-specific model, calibrated to floating-point operations, predicts energy consumption with 11.4% mean absolute percentage error, significantly outperforming TDP-based approaches (27-37% error). We identified distinct power signatures between transformer and CNN architectures, with transformers showing characteristic fluctuations that may impact grid stability.