🤖 AI Summary
Empirical measurements of the environmental impact—energy, carbon, and water consumption—of AI services in production environments remain scarce.
Method: This paper presents the first full-stack (from AI accelerators to data centers), multi-dimensional environmental footprint quantification of generative AI inference workloads for Gemini, a large-scale production AI assistant. Our approach integrates fine-grained power monitoring, regionally resolved grid carbon intensity data, and a water footprint model to enable per-request energy efficiency and emissions accounting.
Contribution/Results: We introduce the first standardized environmental assessment framework tailored to production AI services. We demonstrate that synergistic software optimization and clean energy procurement substantially reduce environmental burden: median energy consumption per text request is 0.24 Wh, with a corresponding water footprint of 0.26 mL. Over one year, system-level energy efficiency improved 33×, while carbon emissions decreased 44×.
📝 Abstract
The transformative power of AI is undeniable - but as user adoption accelerates, so does the need to understand and mitigate the environmental impact of AI serving. However, no studies have measured AI serving environmental metrics in a production environment. This paper addresses this gap by proposing and executing a comprehensive methodology for measuring the energy usage, carbon emissions, and water consumption of AI inference workloads in a large-scale, AI production environment. Our approach accounts for the full stack of AI serving infrastructure - including active AI accelerator power, host system energy, idle machine capacity, and data center energy overhead. Through detailed instrumentation of Google's AI infrastructure for serving the Gemini AI assistant, we find the median Gemini Apps text prompt consumes 0.24 Wh of energy - a figure substantially lower than many public estimates. We also show that Google's software efficiency efforts and clean energy procurement have driven a 33x reduction in energy consumption and a 44x reduction in carbon footprint for the median Gemini Apps text prompt over one year. We identify that the median Gemini Apps text prompt uses less energy than watching nine seconds of television (0.24 Wh) and consumes the equivalent of five drops of water (0.26 mL). While these impacts are low compared to other daily activities, reducing the environmental impact of AI serving continues to warrant important attention. Towards this objective, we propose that a comprehensive measurement of AI serving environmental metrics is critical for accurately comparing models, and to properly incentivize efficiency gains across the full AI serving stack.