🤖 AI Summary
This study addresses the unclear mechanisms by which AI data centers under shared GPU architectures dynamically influence grid power, particularly the challenge of simultaneously mitigating aggregate power fluctuations and meeting short-term ramping demands. The authors propose a modeling framework calibrated with real-world traces that integrates workload arrival patterns, queuing dynamics, scheduling policies, and GPU power characteristics to systematically analyze the impact of varying batch-to-inference task mixtures. Their findings reveal that power fluctuations follow a U-shaped trend while short-term ramping exhibits a hump-shaped pattern across mixture ratios. At moderate mixture levels, queued batch jobs effectively fill inference idle periods, substantially reducing power variability while preserving essential ramping capability. This work is the first to demonstrate the feasibility of decoupling power fluctuation from ramping requirements through load composition tuning, offering a theoretical foundation for green, coordinated scheduling in AI data centers.
📝 Abstract
Artificial intelligence (AI) is driving rapid growth in electricity demand, yet the grid-facing power dynamics of AI data centers remain poorly understood. Here we show that, in shared-GPU systems, the composition of batch and inference workloads decouples aggregate power variability from short-horizon ramping. As the inference share rises, variability becomes U-shaped, whereas ramping becomes hump-shaped, particularly under higher loading. The magnitude and turning points of these patterns also depend on system loading. Using a trace-calibrated framework linking workload arrivals, queueing, scheduling, and GPU power, we show that the underlying mechanism is asymmetric. At intermediate workload mixes, queued batch jobs fill capacity left idle by fluctuating inference demand, reducing aggregate power variability. However, short-horizon ramping remains elevated because inference-side fluctuations propagate more directly into realized power. AI data centers should therefore be understood as dynamic systems whose workload composition shapes their grid impact.