Breaking the ICE: Exploring promises and challenges of benchmarks for Inference Carbon&Energy estimation for LLMs

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Large language model (LLM) inference incurs substantial energy consumption and carbon emissions, yet existing estimation tools are intrusive, low-accuracy, and require cumbersome input configurations. Method: We propose R-ICE, a lightweight, non-intrusive modeling framework that systematically exploits transferable signals from public LLM benchmark datasets (e.g., OpenLLM, LMSys) for carbon estimation. R-ICE establishes an end-to-end, prompt-level pipeline for fine-grained energy and carbon emission estimation, integrating feature engineering, regression modeling, and dynamic carbon intensity mapping. Contribution/Results: R-ICE enables novel applications such as dynamic LLM routing and carbon accounting, with low runtime overhead, high adaptability, and cross-hardware generalizability. Extensive experiments across diverse models and hardware configurations demonstrate an average estimation error of <12%, significantly outperforming conventional monitoring approaches. R-ICE provides a scalable, infrastructure-ready foundation for green AI evaluation.

Technology Category

Application Category

📝 Abstract

While Generative AI stands to be one of the fastest adopted technologies ever, studies have made evident that the usage of Large Language Models (LLMs) puts significant burden on energy grids and our environment. It may prove a hindrance to the Sustainability goals of any organization. A crucial step in any Sustainability strategy is monitoring or estimating the energy consumption of various components. While there exist multiple tools for monitoring energy consumption, there is a dearth of tools/frameworks for estimating the consumption or carbon emissions. Current drawbacks of both monitoring and estimation tools include high input data points, intrusive nature, high error margin, etc. We posit that leveraging emerging LLM benchmarks and related data points can help overcome aforementioned challenges while balancing accuracy of the emission estimations. To that extent, we discuss the challenges of current approaches and present our evolving framework, R-ICE, which estimates prompt level inference carbon emissions by leveraging existing state-of-the-art(SOTA) benchmark. This direction provides a more practical and non-intrusive way to enable emerging use-cases like dynamic LLM routing, carbon accounting, etc. Our promising validation results suggest that benchmark-based modelling holds great potential for inference emission estimation and warrants further exploration from the scientific community.

Problem

Research questions and friction points this paper is trying to address.

Estimating carbon emissions from LLM inference tasks

Addressing lack of tools for energy and carbon estimation

Improving accuracy and practicality in emission monitoring

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLM benchmarks for emission estimation

Non-intrusive prompt-level carbon emission framework

Dynamic LLM routing via benchmark-based modeling

🔎 Similar Papers

No similar papers found.