Green-LLM: Optimal Workload Allocation for Environmentally-Aware Distributed Inference

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the distributed multi-objective scheduling of LLM inference tasks across heterogeneous edge data centers under spatiotemporal dynamics—including volatile local renewable generation, time-varying electricity pricing, and water resource constraints. We propose the first holistic optimization framework jointly minimizing energy consumption, carbon emissions, water usage, and user-perceived latency. Our approach uniquely integrates the environmental triad (electricity–carbon–water) into a unified model, explicitly capturing spatiotemporal correlations of renewables, hardware heterogeneity across data centers, and strict QoS requirements (e.g., latency SLAs). Formulated as a mixed-integer nonlinear program (MINLP), the problem is solved via a tailored decomposition-based algorithm. Experiments on real-world traces demonstrate that our method reduces operational cost by 19.7%, carbon emissions by 23.4%, and water consumption by 16.2% on average over baselines, while satisfying low-latency SLAs for over 98.5% of requests—significantly advancing system-level sustainability and practicality of green AI inference.

Technology Category

Application Category

📝 Abstract
This letter investigates the optimal allocation of large language model (LLM) inference workloads across heterogeneous edge data centers (DCs) over time. Each DC features on-site renewable generation and faces dynamic electricity prices and spatiotemporal variability in renewable availability. The central question is: how can inference workloads be optimally distributed to the DCs to minimize energy consumption, carbon emissions, and water usage while enhancing user experience? This letter proposes a novel optimization model for LLM service providers to reduce operational costs and environmental impacts. Numerical results validate the efficacy of the proposed approach.
Problem

Research questions and friction points this paper is trying to address.

Optimal workload allocation for LLM inference in edge data centers
Minimize energy, carbon, and water usage while improving user experience
Address dynamic electricity prices and renewable availability variability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal workload allocation across edge data centers
Dynamic pricing and renewable energy integration
Minimize energy, emissions, and water usage
🔎 Similar Papers
No similar papers found.