Architectural Trade-offs in the Energy-Efficient Era: A Comparative Study of power-capping NVIDIA H100 and H200

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This study systematically evaluates the performance–energy efficiency trade-offs of NVIDIA H100 (HBM2e) and H200 (HBM3e) GPUs under varying power constraints in energy-conscious computing scenarios. Through power-capping experiments, DGEMM and TheBandwidthBenchmark microbenchmarks, and integrated Roofline modeling with regression analysis, this work reveals for the first time significant differences in memory power behavior between the two architectures and identifies anomalous samples exhibiting abnormally high memory power consumption. The findings demonstrate that the H100 achieves slightly better energy efficiency for compute-bound workloads, whereas the H200 excels in memory-intensive tasks. These results provide empirical guidance and a novel perspective for energy-efficient GPU selection and system optimization.

Technology Category

Application Category

📝 Abstract

Modern NVIDIA GPUs like the H100 (HBM2e) and H200 (HBM3e) share similar compute characteristics but differ significantly in memory interface technology and bandwidth. By isolating memory bandwidth as a key variable, the power distribution between the memory and Streaming Multiprocessors (SM) changes notably between the two architectures. In the era of energy-efficient computing, analyzing how these hardware characteristics impact performance per watt is critical. This study investigates how the H100 and H200 manage memory power consumption at various power-cap levels. By a regression analysis, we study the memory power limit and uncover outliers consuming more memory power. To evaluate efficiency, we employ compute-bound (DGEMM) and memory-bound (TheBandwidthBenchmark) workloads, representing the two extremes of the Roof\-line model. Our observations indicate that across varying power caps, the H100 remains the slightly better choice for strictly compute-bound workloads, whereas the H200 demonstrates superior efficiency for memory-bound applications.

Problem

Research questions and friction points this paper is trying to address.

power-capping

memory bandwidth

energy efficiency

GPU architecture

performance per watt

Innovation

Methods, ideas, or system contributions that make the work stand out.

power-capping

memory bandwidth

energy efficiency