🤖 AI Summary
This study addresses the critical gap in generative AI research by providing the first fine-grained, system-wide environmental assessment of the entire development lifecycle of a multimodal large language model—Moshi—extending beyond the commonly analyzed training phase to include often-overlooked activities such as failed experiments and debugging. By integrating GPU usage logs, detailed development records, and life cycle assessment (LCA) methodologies, the work comprehensively quantifies energy consumption, water use, greenhouse gas emissions, and critical mineral depletion across all stages from conception to deployment. The findings reveal the full ecological footprint of multimodal model development and offer actionable guidelines for greener AI practices, delivering empirical evidence and concrete pathways to reduce both computational and environmental costs in future large-scale AI systems.
📝 Abstract
New multi-modal large language models (MLLMs) are continuously being trained and deployed, following rapid development cycles. This generative AI frenzy is driving steady increases in energy consumption, greenhouse gas emissions, and a plethora of other environmental impacts linked to datacenter construction and hardware manufacturing. Mitigating the environmental consequences of GenAI remains challenging due to an overall lack of transparency by the main actors in the field. Even when the environmental impacts of specific models are mentioned, they are typically restricted to the carbon footprint of the final training run, omitting the research and development stages.
In this work, we explore the impact of GenAI research through a fine-grained analysis of the compute spent to create Moshi, a 7B-parameter speech-text foundation model for real-time dialogue developed by Kyutai, a leading privately funded open science AI lab. For the first time, our study dives into the anatomy of compute-intensive MLLM research, quantifying the GPU-time invested in specific model components and training phases, as well as early experimental stages, failed training runs, debugging, and ablation studies. Additionally, we assess the environmental impacts of creating Moshi from beginning to end using a life cycle assessment methodology: we quantify energy and water consumption, greenhouse gas emissions, and mineral resource depletion associated with the production and use of datacenter hardware.
Our detailed analysis allows us to provide actionable guidelines to reduce compute usage and environmental impacts of MLLM research, paving the way for more sustainable AI research.