🤖 AI Summary
This study addresses the prevalent yet overlooked "execution idle" state in GPU clusters, characterized by low computational activity coupled with high power consumption, leading to substantial energy waste. For the first time, this work formally defines execution idleness as a distinct operational mode warranting targeted optimization. Through fine-grained, second-level telemetry data from large-scale AI clusters, the authors quantify its prevalence and energy impact, revealing it accounts for 19.7% of job duration and 10.7% of total energy consumption. To mitigate this inefficiency, two prototype strategies are proposed: automatic clock-frequency scaling and reducing exposure duration by exploiting workload imbalance. Experimental results demonstrate that both approaches effectively lower energy consumption during execution idle periods, validating the feasibility and value of treating this state as a dedicated target for energy-saving optimizations.
📝 Abstract
GPUs are becoming a major contributor to data center power, yet unlike CPUs, they can remain at high power even when visible activity is near zero. We call this state execution-idle. Using per-second telemetry from a large academic AI cluster, we characterize execution-idle as a recurring low-activity yet high-power state in real deployments. Across diverse workloads and multiple GPU generations, it accounts for 19.7% of in-execution time and 10.7% of energy. This suggests a need to both reduce the cost of execution-idle and reduce exposure to it. We therefore build two prototypes: one uses automatic downscaling during execution-idle, and the other uses load imbalance to reduce exposure, both with performance trade-offs. These findings suggest that future energy-efficient GPU systems should treat execution-idle as a first-class operating state.