π€ AI Summary
This work addresses the lack of fine-grained and accurate carbon emission measurement methods during large language model (LLM) inference, which hinders informed sustainability decisions. To bridge this gap, the paper introduces the first systematic reference framework to guide the design of carbon estimation tools for LLM inference and presents SEALβan early implementation enabling per-prompt carbon footprint assessment. SEAL integrates multi-benchmark-driven modeling, fine-grained energy consumption mapping, and a dedicated carbon estimation algorithm, substantially improving both accuracy and generalizability. Preliminary experiments demonstrate its effectiveness, laying the groundwork for standardized, reproducible sustainability evaluation within the LLM ecosystem.
π Abstract
Large Language Models are rapidly gaining traction in software engineering, yet their growing carbon footprint raises pressing sustainability concerns. While training emissions are substantial, inference quickly surpasses them due to the sheer volume of prompts processed. This shift underscores the urgent need for accurate, prompt-level carbon measurement during inference to enable informed, sustainability-focused decision-making. To address the limitations of existing approaches, in this paper, we outline the guiding principles for a novel reference framework for LLM inference carbon estimation that can guide the design of future tools and provide a systematic foundation for advancing sustainability research in this domain. We also introduce SEAL, an early embodiment of these principles that leverages a multi-benchmark-driven approach for per-prompt carbon estimation. Its initial validation shows promising results, positioning SEAL as a foundation for standardized sustainability assessment across the LLM ecosystem.