🤖 AI Summary
This work addresses the distributed multi-objective scheduling of LLM inference tasks across heterogeneous edge data centers under spatiotemporal dynamics—including volatile local renewable generation, time-varying electricity pricing, and water resource constraints. We propose the first holistic optimization framework jointly minimizing energy consumption, carbon emissions, water usage, and user-perceived latency. Our approach uniquely integrates the environmental triad (electricity–carbon–water) into a unified model, explicitly capturing spatiotemporal correlations of renewables, hardware heterogeneity across data centers, and strict QoS requirements (e.g., latency SLAs). Formulated as a mixed-integer nonlinear program (MINLP), the problem is solved via a tailored decomposition-based algorithm. Experiments on real-world traces demonstrate that our method reduces operational cost by 19.7%, carbon emissions by 23.4%, and water consumption by 16.2% on average over baselines, while satisfying low-latency SLAs for over 98.5% of requests—significantly advancing system-level sustainability and practicality of green AI inference.
📝 Abstract
This letter investigates the optimal allocation of large language model (LLM) inference workloads across heterogeneous edge data centers (DCs) over time. Each DC features on-site renewable generation and faces dynamic electricity prices and spatiotemporal variability in renewable availability. The central question is: how can inference workloads be optimally distributed to the DCs to minimize energy consumption, carbon emissions, and water usage while enhancing user experience? This letter proposes a novel optimization model for LLM service providers to reduce operational costs and environmental impacts. Numerical results validate the efficacy of the proposed approach.