Evaluating Skill and Stability of ArchesWeather and ArchesWeatherGen under Multi-Decadal Climate Simulations

πŸ“… 2026-05-28
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study presents the first extension of machine learning models originally developed for short-term weather forecasting to decadal-scale climate simulation, evaluating their long-term stability and fidelity in representing climate statistics. Following the AIMIP Phase 1 protocol, the deterministic model ArchesWeather and its flow-matching-based probabilistic counterpart, ArchesWeatherGen, were adapted into atmosphere-only configurations forced by monthly mean sea surface temperatures and sea ice concentrations, with ensemble-based uncertainty quantification incorporated. The results demonstrate that both models remain numerically stable over multi-decadal integrations and accurately reproduce the climatological mean states, large-scale circulation patterns, interannual variability, and the tails of extreme event distributions found in ERA5 reanalysis data. These findings substantiate the potential of such data-driven approaches as efficient and reliable tools for climate modeling.
πŸ“ Abstract
We evaluate the climate simulation capabilities of ArchesWeather and ArchesWeatherGen, two machine learning models originally trained for weather forecasting and evaluated up to a 10-day lead time. ArchesWeather is a deterministic model, while ArchesWeatherGen is a probabilistic flow-matching model leveraging ArchesWeather's forecasts, enabling ensemble-based uncertainty quantification. In this work, we adapt these models to act as forced atmospheric models by using additional conditioning on the monthly mean sea surface temperature (SST) and sea ice cover (SIC) as boundary conditions. In particular, we follow the AI Model Intercomparison Project (AIMIP) Phase 1 protocol, which, analogous to the Atmospheric Model Intercomparison Project (AMIP), proposes a standardized experimental setup to evaluate the climate skill of ML-based forced atmospheric models. We present a comprehensive evaluation of both models under these conditions, including comparison against numerical climate models, ablation studies that examine key design choices in the extension, and an analysis of forced versus unforced configurations. Despite being originally developed for weather forecasting, we demonstrate that forced configurations of ArchesWeather and ArchesWeatherGen produce stable long-term climate simulations, have a stable annual cycle, and capture the drift of many climate variables. The models faithfully reproduce ERA5's climatology, large-scale circulations and interannual variability, and they capture the tails of the distributions.
Problem

Research questions and friction points this paper is trying to address.

climate simulation
machine learning
weather forecasting
forced atmospheric model
long-term stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

forced atmospheric modeling
machine learning for climate
multi-decadal simulation
probabilistic flow-matching
AIMIP