Power-Capping Metric Evaluation for Improving Energy Efficiency

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Power optimization for exascale supercomputing remains challenging, particularly under the heterogeneous GH200 superchip architecture. Method: This work proposes a CPU–GPU collaborative runtime dynamic power management framework, introducing a novel speed–energy–latency joint metric model and a Euclidean-distance-based multi-objective optimization scheme. It achieves, for the first time, fine-grained GPU task-level power control integrated with holistic CPU–GPU power orchestration. Contribution/Results: Evaluated on the LSMS scientific application, the method demonstrates that moderate GPU power reduction preserves computational performance while significantly improving system energy efficiency—achieving 12.7% global energy savings with only marginal latency overhead (<3.2%). This work establishes a scalable methodology and empirical foundation for adaptive energy-efficiency optimization in exascale systems.

Technology Category

Application Category

📝 Abstract
With high-performance computing systems now running at exascale, optimizing power-scaling management and resource utilization has become more critical than ever. This paper explores runtime power-capping optimizations that leverage integrated CPU-GPU power management on architectures like the NVIDIA GH200 superchip. We evaluate energy-performance metrics that account for simultaneous CPU and GPU power-capping effects by using two complementary approaches: speedup-energy-delay and a Euclidean distance-based multi-objective optimization method. By targeting a mostly compute-bound exascale science application, the Locally Self-Consistent Multiple Scattering (LSMS), we explore challenging scenarios to identify potential opportunities for energy savings in exascale applications, and we recognize that even modest reductions in energy consumption can have significant overall impacts. Our results highlight how GPU task-specific dynamic power-cap adjustments combined with integrated CPU-GPU power steering can improve the energy utilization of certain GPU tasks, thereby laying the groundwork for future adaptive optimization strategies.
Problem

Research questions and friction points this paper is trying to address.

Optimizing power-scaling management in exascale computing systems
Evaluating energy-performance metrics for CPU-GPU power-capping effects
Improving energy efficiency via dynamic power-cap adjustments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrated CPU-GPU power management optimization
Speedup-energy-delay and Euclidean distance metrics
Dynamic GPU task-specific power-cap adjustments
🔎 Similar Papers
No similar papers found.
M
Maria Patrou
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
T
Thomas Wang
Camas High School, Camas, Washington, USA
Wael Elwasif
Wael Elwasif
Oak Ridge National Laboratory
M
M. Eisenbach
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
R
Ross Miller
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
W
William Godoy
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA
O
Oscar Hernandez
Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA