🤖 AI Summary
This study addresses inherent limitations—spectral bias and ensemble miscalibration—in conventional weather and climate prediction models for high-resolution Earth system dynamical modeling. We propose AERIS, the first pixel-level Swin diffusion Transformer architecture scalable from 1.3B to 80B parameters, and introduce SWiPe, a novel parallelism paradigm unifying window, sequence, and pipeline parallelism to eliminate cross-device communication overhead, thereby substantially enhancing training stability and scalability of ultra-large diffusion models. Deployed on the Aurora supercomputer, AERIS achieves a peak performance of 11.21 ExaFLOPS, with weak and strong scaling efficiencies of 95.5% and 81.6%, respectively. Forecast accuracy surpasses that of the ECMWF’s Integrated Forecasting System (IFS) ensemble, while maintaining long-term stability over 90-day seasonal timescales.
📝 Abstract
Generative machine learning offers new opportunities to better understand complex Earth system dynamics. Recent diffusion-based methods address spectral biases and improve ensemble calibration in weather forecasting compared to deterministic methods, yet have so far proven difficult to scale stably at high resolutions. We introduce AERIS, a 1.3 to 80B parameter pixel-level Swin diffusion transformer to address this gap, and SWiPe, a generalizable technique that composes window parallelism with sequence and pipeline parallelism to shard window-based transformers without added communication cost or increased global batch size. On Aurora (10,080 nodes), AERIS sustains 10.21 ExaFLOPS (mixed precision) and a peak performance of 11.21 ExaFLOPS with $1 imes 1$ patch size on the 0.25° ERA5 dataset, achieving 95.5% weak scaling efficiency, and 81.6% strong scaling efficiency. AERIS outperforms the IFS ENS and remains stable on seasonal scales to 90 days, highlighting the potential of billion-parameter diffusion models for weather and climate prediction.