🤖 AI Summary
The absence of standardized benchmarks for quantifying carbon emissions in large language model (LLM) inference services hinders fair, cross-model, cross-configuration, and cross-hardware sustainability evaluation.
Method: This paper introduces FUEL, the first Functional Unit (FU)-driven carbon impact assessment framework. FUEL unifies environmental impact measurement by mapping it to a standardized FU—“one effective inference task”—enabling comparable carbon accounting across models, configurations, and hardware platforms. It integrates FU-based modeling, cradle-to-gate carbon footprint analysis, and multi-dimensional empirical benchmarking (latency, energy consumption, CO₂-equivalent emissions).
Contribution/Results: FUEL establishes a standardized FU-based metric paradigm; empirically uncovers sustainability trade-offs among model scale, quantization precision, and hardware selection; and demonstrates that synergistic optimization—including model lightweighting, operator-level acceleration, and hardware-aware deployment—reduces service carbon emissions by 30–65%. FUEL provides a reusable evaluation standard and systematic optimization methodology for green AI.
📝 Abstract
Large language models (LLMs) offer powerful capabilities but come with significant environmental costs, particularly in carbon emissions. Existing studies benchmark these emissions but lack a standardized basis for comparison across models. To address this, we introduce the concept of a functional unit (FU) and develop FUEL, the first FU-based framework for evaluating LLM serving's environmental impact. Through case studies on model size, quantization, and hardware, we uncover key trade-offs in sustainability. Our findings highlight the potential for reducing carbon emissions by optimizing model selection, deployment strategies, and hardware choices, paving the way for more sustainable AI infrastructure.