🤖 AI Summary
This study systematically evaluates the performance–energy efficiency trade-offs of NVIDIA H100 (HBM2e) and H200 (HBM3e) GPUs under varying power constraints in energy-conscious computing scenarios. Through power-capping experiments, DGEMM and TheBandwidthBenchmark microbenchmarks, and integrated Roofline modeling with regression analysis, this work reveals for the first time significant differences in memory power behavior between the two architectures and identifies anomalous samples exhibiting abnormally high memory power consumption. The findings demonstrate that the H100 achieves slightly better energy efficiency for compute-bound workloads, whereas the H200 excels in memory-intensive tasks. These results provide empirical guidance and a novel perspective for energy-efficient GPU selection and system optimization.
📝 Abstract
Modern NVIDIA GPUs like the H100 (HBM2e) and H200 (HBM3e) share similar compute characteristics but differ significantly in memory interface technology and bandwidth. By isolating memory bandwidth as a key variable, the power distribution between the memory and Streaming Multiprocessors (SM) changes notably between the two architectures. In the era of energy-efficient computing, analyzing how these hardware characteristics impact performance per watt is critical. This study investigates how the H100 and H200 manage memory power consumption at various power-cap levels. By a regression analysis, we study the memory power limit and uncover outliers consuming more memory power. To evaluate efficiency, we employ compute-bound (DGEMM) and memory-bound (TheBandwidthBenchmark) workloads, representing the two extremes of the Roof\-line model. Our observations indicate that across varying power caps, the H100 remains the slightly better choice for strictly compute-bound workloads, whereas the H200 demonstrates superior efficiency for memory-bound applications.