Datacenter Energy Optimized Power Profiles

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Data centers face significant challenges in simultaneously optimizing energy efficiency and performance for both HPC and AI workloads under stringent power constraints. To address this, this paper proposes a hardware-software co-designed power optimization framework built upon the Blackwell architecture. Our approach introduces a workload-aware dynamic power allocation mechanism that integrates domain-knowledge-driven policy generation, GPU-level power management, Max-Q technology, phase-based power delivery control, and low-level architectural support to enable fine-grained, real-time power distribution. Experimental evaluation demonstrates that, while maintaining critical application performance at ≥97% of baseline, the system achieves up to 15% reduction in energy consumption and a 13% improvement in overall computational throughput—substantially outperforming existing static or coarse-grained power management schemes.

Technology Category

Application Category

📝 Abstract
This paper presents datacenter power profiles, a new NVIDIA software feature released with Blackwell B200, aimed at improving energy efficiency and/or performance. The initial feature provides coarse-grain user control for HPC and AI workloads leveraging hardware and software innovations for intelligent power management and domain knowledge of HPC and AI workloads. The resulting workload-aware optimization recipes maximize computational throughput while operating within strict facility power constraints. The phase-1 Blackwell implementation achieves up to 15% energy savings while maintaining performance levels above 97% for critical applications, enabling an overall throughput increase of up to 13% in a power-constrained facility. KEYWORDS GPU power management, energy efficiency, power profile, HPC optimization, Max-Q, Blackwell architecture
Problem

Research questions and friction points this paper is trying to address.

Optimizing datacenter energy efficiency and performance through power profiles
Managing computational throughput within strict facility power constraints
Applying intelligent power management to HPC and AI workloads
Innovation

Methods, ideas, or system contributions that make the work stand out.

Software feature enabling intelligent power management control
Hardware-software integration for workload-aware optimization recipes
Implementation achieving energy savings while maintaining performance levels
🔎 Similar Papers
No similar papers found.
S
Sreedhar Narayanaswamy
Silicon Solutions Group, Nvidia, Santa Clara, USA
P
Pratikkumar Dilipkumar Patel
Software, Nvidia, Santa Clara, USA
Ian Karlin
Ian Karlin
Lawrence Livermore National Laboratory
A
Apoorv Gupta
Software, Nvidia, Santa Clara, USA
S
Sudhir Saripalli
Software, Nvidia, Santa Clara, USA
J
Janey Guo
Silicon Solutions Group, Nvidia, Shanghai, China