Can AI Weather Models Predict Beyond Two Weeks? A Quantitative Benchmark and Analysis of Long Rollouts

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AI-based weather models often exhibit instability in rolling forecasts beyond two weeks and lack a systematic understanding of their failure modes. This study addresses this gap by conducting a year-long rolling forecast benchmark across nine state-of-the-art AI weather models, thereby introducing the first taxonomy of long-range prediction failures—categorized as explosion, drift, and loss of seasonality—and uncovering an intrinsic link between model stability and the handling of fine-scale spatiotemporal features. Leveraging a Vision Transformer architecture augmented with noise injection, ablation studies, and high-frequency energy analysis, the work demonstrates that models endowed with denoising capabilities produce initial-condition-dependent weather trajectories, underscoring the critical role of architectural design in ensuring long-term forecast stability.
📝 Abstract
While AI weather models excel at short-to-medium range forecasts (up to 15 days), they frequently suffer from ill-defined "instabilities" when rolled out over longer horizons. This work addresses the lack of a formal taxonomy by categorizing these failures into three distinct regimes: blow-up, drift, and loss of seasonality, through year-long rollouts of nine state-of-the-art AI weather models. Our analysis reveals that stability hinges on the treatment of small spatio-temporal scales: unstable models amplify high-frequency energy, while stable models act as denoisers when noise is added to their inputs. Far from reducing these models to mere stochastic parrots, our findings highlight that stable models generate unique weather trajectories, conditioned on the initial state. We verify our findings through ablation studies on architectural design choices, conducted using state-of-the-art Vision Transformer (ViT) AI weather model architectures.
Problem

Research questions and friction points this paper is trying to address.

AI weather models
long-term forecasting
instability
rollout
spatio-temporal scales
Innovation

Methods, ideas, or system contributions that make the work stand out.

long-range weather forecasting
AI model stability
spatio-temporal scales
Vision Transformer
model rollouts