🤖 AI Summary
The deployment of high-density AI accelerators renders traditional data center power delivery hierarchies inefficient in utilizing provisioned power, leading to resource waste and stranded capacity. This work presents the first integrated evaluation framework that jointly models GPU, compute, and storage placement, leveraging real-world Azure traces of workload arrivals, overbooking patterns, and hardware retirement schedules to co-optimize power, performance, and cost. Innovatively incorporating multi-resource stranding into power infrastructure design, the study proposes “deployable capacity”—the actual computational capacity that can be effectively powered and utilized—as a more meaningful planning objective than conventional “nameplate power.” It further quantifies the impact of high-density AI systems on deployable capacity, effective capital expenditure, and delivered performance, offering critical insights for rethinking data center power architectures in the AI era.
📝 Abstract
Demand for AI accelerators is rapidly increasing rack power density, with projections approaching 1MW per deployment by 2027. This poses a major challenge for datacenter power delivery designers. As power densities increase, a datacenter designed for a different target density may strand power, i.e., may be unable to use all the power that its delivery hierarchy has provisioned. Designs must remain efficient over long datacenter lifetimes and multiple hardware generations. Power utilization is particularly important as grid power capacity is a scarce resource in the AI era. Designing an efficient power delivery hierarchy for the long run is difficult because rack placement feasibility, workload impact, and cost depend jointly on electrical topology, deployment granularity, placement policy, power oversubscription, and workload mix. Moreover, each of these factors evolve over time, have inter-dependencies across multiple resource dimensions, and generally do not lend themselves to closed-form analysis. To address this challenge, we develop a framework for evaluating datacenter power delivery designs using throughput, power, and cost metrics over realistic arrival, oversubscription, and decommissioning sequences. The framework combines projection models for GPU, compute, and storage deployments with operational factors grounded in production data from Microsoft Azure. Our results show that multi-resource stranding materially changes deployable capacity, effective capital expenditure, and delivered performance, and quantify how rising density from rack- and pod-scale AI systems shapes these outcomes. For AI datacenter design, the relevant planning objective is not installed megawatts, but deployable capacity over time.