Sustainable Carbon-Aware and Water-Efficient LLM Scheduling in Geo-Distributed Cloud Datacenters

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

To address high carbon emissions, substantial freshwater consumption, and poor quality-of-service (QoS) — specifically, elevated first-token latency — during large language model (LLM) inference across geographically distributed cloud data centers, this paper proposes SLIT, the first framework enabling joint multi-objective optimization of carbon footprint, freshwater usage, and QoS. Methodologically, SLIT integrates a machine learning–enhanced metaheuristic algorithm, a geography-aware real-time scheduling mechanism, and a dynamic multi-dimensional environmental model capturing carbon intensity, water stress, and energy supply, coupled with adaptive weight tuning. Experimental evaluation demonstrates that SLIT achieves stringent first-token latency guarantees while significantly reducing cumulative carbon emissions and freshwater consumption during inference. Overall sustainability improves by over 40%, surpassing conventional energy-efficiency–centric optimization paradigms.

Technology Category

Application Category

📝 Abstract

In recent years, Large Language Models (LLM) such as ChatGPT, CoPilot, and Gemini have been widely adopted in different areas. As the use of LLMs continues to grow, many efforts have focused on reducing the massive training overheads of these models. But it is the environmental impact of handling user requests to LLMs that is increasingly becoming a concern. Recent studies estimate that the costs of operating LLMs in their inference phase can exceed training costs by 25x per year. As LLMs are queried incessantly, the cumulative carbon footprint for the operational phase has been shown to far exceed the footprint during the training phase. Further, estimates indicate that 500 ml of fresh water is expended for every 20-50 requests to LLMs during inference. To address these important sustainability issues with LLMs, we propose a novel framework called SLIT to co-optimize LLM quality of service (time-to-first token), carbon emissions, water usage, and energy costs. The framework utilizes a machine learning (ML) based metaheuristic to enhance the sustainability of LLM hosting across geo-distributed cloud datacenters. Such a framework will become increasingly vital as LLMs proliferate.

Problem

Research questions and friction points this paper is trying to address.

Reducing carbon emissions from LLM inference operations

Minimizing water usage during LLM request processing

Optimizing energy costs and QoS in geo-distributed datacenters

Innovation

Methods, ideas, or system contributions that make the work stand out.

ML-based metaheuristic for sustainable LLM scheduling

Co-optimizes QoS, carbon, water, and energy

Geo-distributed datacenter resource management

🔎 Similar Papers

No similar papers found.

ByteDance

圣何塞

Research Engineer / Scientist - Storage for LLM

ByteDance

西雅图

Authors to Follow