EcoShift: Performance-Aware Power Management for Power-Constrained Heterogeneous Systems

📅 2026-04-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

232K/year
🤖 AI Summary
This work addresses the challenge of efficient power reallocation in cluster environments under strict power budgets, where existing power management strategies fall short due to their neglect of the differing performance sensitivities of CPU- and GPU-bound applications to power caps. To overcome this limitation, we propose EcoShift, a novel framework that, for the first time, integrates application-level performance sensitivity to CPU and GPU power allocation into cluster-wide scheduling. EcoShift combines online performance prediction with a dynamic programming-based allocator to enable fine-grained, performance-aware power redistribution. Evaluated on heterogeneous platforms featuring Intel CPUs and NVIDIA A100/H100 GPUs, EcoShift consistently meets total power constraints while achieving up to 6% higher average performance compared to state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
Power-constrained HPC systems increasingly run heterogeneous CPU--GPU applications under strict cluster-wide power limits. Existing cluster-wide power management policies rely on fair-share or utilization heuristics and do not capture application-specific sensitivity to CPU and GPU power caps, leading to inefficient use of reclaimed power. We present EcoShift, a performance-aware cluster-wide power management framework. EcoShift combines online performance prediction with a dynamic-programming-based allocator to distribute reclaimed power across CPU--GPU applications for maximum average performance improvement. Through emulation-based evaluation on two heterogeneous Intel CPU and NVIDIA A100/H100 GPU platforms with diverse CPU--GPU workloads, EcoShift consistently outperforms state-of-the-art policies, achieving up to 6% average performance improvement while preserving the cluster-wide power constraint.
Problem

Research questions and friction points this paper is trying to address.

power-constrained systems
heterogeneous computing
cluster-wide power management
CPU-GPU applications
performance sensitivity
Innovation

Methods, ideas, or system contributions that make the work stand out.

performance-aware power management
heterogeneous systems
dynamic programming
online performance prediction
power capping
🔎 Similar Papers
2024-07-31International Conference on Electronics, Circuits, and SystemsCitations: 0