AI Application Benchmarking: Power-Aware Performance Analysis for Vision and Language Models

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This study addresses the lack of systematic benchmarking methodologies for evaluating the performance–energy trade-offs of state-of-the-art AI workloads across diverse GPU architectures, a gap that hinders the deployment of energy-efficient computing infrastructures. The authors present a throughput-oriented benchmarking framework tailored for computer vision and large language models, empirically assessing the real-world energy efficiency of NVIDIA H100, H200, and AMD MI300X GPUs under varying power constraints. Their analysis reveals, for the first time, that the optimal power limit is highly dependent on both workload characteristics and hardware architecture, with the H100 and H200 exhibiting markedly different energy-efficiency profiles due to disparities in HBM memory configurations. These findings provide empirical foundations for fine-grained power management in AI infrastructure, and the accompanying benchmarking tools have been open-sourced.

Technology Category

Application Category

📝 Abstract

Artificial Intelligence (AI) workloads drive a rapid expansion of high-performance computing (HPC) infrastructures and increase their power and energy demands towards a critical level. AI benchmarks representing state-of-the art workloads and their understanding in the context of performance-energy trade-offs are critical to deploy efficient infrastructures and can guide energy efficiency measures, such as power capping. We introduce a benchmarking framework with popular deep learning applications from computer vision (image classification and generation) and large language models (continued pre-training and inference) implementing modern methods. Our performance analysis focuses on throughput rather than time to "completion", which is the standard metric in HPC. We analyse performance and energy efficiency under various power capping scenarios on NVIDIA H100, NVIDIA H200, and AMD MI300X GPUs. Our results reveal that no universal optimal power cap exists, as the efficiency peak varies across application types and GPU architectures. Interestingly, the two NVIDIA GPUs which mainly differ in their HBM configuration show qualitatively different performance-energy trade-offs. The developed benchmarking framework will be released as a public tool.

Problem

Research questions and friction points this paper is trying to address.

AI benchmarking

power capping

performance-energy trade-off

vision models

language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

power-aware benchmarking

throughput-oriented performance

AI energy efficiency