π€ AI Summary
Accurate and scalable state estimation remains a critical bottleneck in Earth system prediction, hindering robust uncertainty quantification and extreme event forecasting. This work proposes a unified single-stage generative data assimilation framework that reframes the problem as Bayesian posterior sampling and introduces STORM, a spatiotemporal Transformer model featuring a linear-complexity global attention mechanism that overcomes the traditional quadratic scaling limitation. Leveraging large-scale GPU parallelism, the method achieves 63% strong scaling efficiency and sustained performance of 1.6 ExaFLOPs on 32,768 GPUs of the Frontier supercomputer, enabling simulations with 20 billion spatiotemporal tokensβmarking the first kilometer-scale global Earth system modeling at this scale and supporting simulations spanning 177,000 time steps.
π Abstract
Accurate weather and climate prediction relies on data assimilation (DA), which estimates the Earth system state by integrating observations with models. While exascale computing has significantly advanced earth simulation, scalable and accurate inference of the Earth system state remains a fundamental bottleneck, limiting uncertainty quantification and prediction of extreme events. We introduce a unified one-stage generative DA framework that reformulates assimilation as Bayesian posterior sampling, replacing the conventional forecast-update cycle with compute-dense, GPU-efficient inference. At the core is STORM, a novel spatiotemporal transformer with a global attention linear-complexity scaling algorithm that breaks the quadratic attention barrier. On 32,768 GPUs of the Frontier supercomputer, our method achieves 63% strong scaling efficiency and 1.6 ExaFLOP sustained performance. We further scale to 20 billion spatiotemporal tokens, enabling km-scale global modeling over 177k temporal frames, regimes previously unreachable, establishing a new paradigm for Earth system prediction.