🤖 AI Summary
This study addresses a critical gap in current environmental impact assessments of large language models, which typically focus only on pretraining while neglecting post-training phases such as fine-tuning, preference optimization, and reinforcement learning. For the first time, it presents a comprehensive lifecycle analysis of the Olmo 3 model series (7B/32B), systematically accounting for energy consumption across all development stages—including experimental runs, failed trials, and ablation studies that do not contribute to the final model. Employing an end-to-end energy tracking methodology integrated with data center electricity usage, carbon emissions, and water consumption metrics, the analysis reveals that 82.2% of the total environmental cost stems from non-final model experiments, with the post-training phase—particularly reinforcement learning inference—exhibiting substantial energy demands. The entire development process consumed approximately 12.3 GWh of electricity, emitted 4,251 metric tons of CO₂e, and used 15,887 kiloliters of water, far exceeding the scope of environmental disclosures currently reported in the industry.
📝 Abstract
Modern language model development extends far beyond pretraining, yet environmental reporting remains narrowly focused on the cost of training a single final model. In this work, we provide the first detailed breakdown of the environmental impact of a full model development pipeline, from pretraining through supervised fine-tuning, preference optimization, and reinforcement learning, for Olmo 3, a family of 7 billion and 32 billion parameter models in both instruction-following and reasoning variants. We find that reasoning models are 17x more expensive to post-train than their instruction-tuned counterparts in terms of datacenter energy, driven by reinforcement learning rollout generation. Development costs (including experimentation, failed runs, and ablations) account for 82.2% of total compute, a roughly 65% increase over the ~50% reported for pretraining-focused pipelines in prior work. In total, we estimate our model development process consumed ~12.3 GWh of datacenter energy, emitted 4,251 tCO2eq, and consumed 15,887 kL of water, with water consumption driven entirely by power generation infrastructure rather than data center cooling. These costs, which are almost entirely unreported by model developers, are growing rapidly as post-training pipelines become more complex, and must be accounted for in environmental reporting standards and by the research community working to reduce AI's environmental impact.