🤖 AI Summary
Existing LLM inference energy and carbon emission assessment frameworks commonly neglect GPU power consumption modeling, leading to inaccurate carbon footprint estimation. This paper proposes the first high-fidelity simulation framework that integrates a fine-grained, utilization-based GPU power model with dynamic grid characteristics, enabling joint quantification of energy consumption and carbon emissions across the full LLM inference pipeline, as well as carbon-aware scheduling analysis. Innovatively, it deeply unifies hardware-level power modeling, LLM inference performance simulation, and energy system co-simulation—enabling, for the first time, interpretable, parameter-level attribution of inference configurations to carbon impact and supporting evaluation of renewable energy integration potential. Experiments show that, under typical deployment scenarios, the framework increases renewable electricity offset rate to 69.2%, providing a verifiable, quantitative tool and evidence-based decision support for low-carbon AI infrastructure design.
📝 Abstract
The environmental impact of Large Language Models (LLMs) is rising significantly, with inference now accounting for more than half of their total lifecycle carbon emissions. However, existing simulation frameworks, which are increasingly used to determine efficient LLM deployments, lack any concept of power and, therefore, cannot accurately estimate inference-related emissions. We present a simulation framework to assess the energy and carbon implications of LLM inference under varying deployment setups. First, we extend a high-fidelity LLM inference simulator with a GPU power model that estimates power consumption based on utilization metrics, enabling analysis across configurations like batch size, sequence length, and model parallelism. Second, we integrate simulation outputs into an energy system co-simulation environment to quantify carbon emissions under specific grid conditions and explore the potential of carbon-aware scheduling. Through scenario-based analysis, our framework reveals how inference parameters affect energy demand and carbon footprint, demonstrates a renewable offset potential of up to 69.2% in an illustrative deployment case, and provides a foundation for future carbon-aware inference infrastructure design.