🤖 AI Summary
Weather forecasting relies on accurately retrieving the global atmospheric initial state from massive, heterogeneous observational data; however, existing data assimilation methods face bottlenecks in resolution, physical consistency, and task generalizability. This paper introduces the first unified latent-space diffusion framework based on score matching, instantiated as a 1.5-billion-parameter spatiotemporal latent diffusion model pretrained on ERA5 data. The framework enables conditional diffusion sampling to generate globally coherent atmospheric trajectories at 0.25° spatial resolution and 1-hour temporal resolution. Crucially, it adapts—without retraining—to arbitrary observation types and downstream tasks (e.g., reanalysis, filtering, forecasting) while strictly enforcing physical constraints at the global scale. Experimental results demonstrate significantly reduced observational reconstruction error and state-of-the-art short-term forecast accuracy.
📝 Abstract
Deep learning has transformed weather forecasting by improving both its accuracy and computational efficiency. However, before any forecast can begin, weather centers must identify the current atmospheric state from vast amounts of observational data. To address this challenging problem, we introduce Appa, a score-based data assimilation model producing global atmospheric trajectories at 0.25-degree resolution and 1-hour intervals. Powered by a 1.5B-parameter spatio-temporal latent diffusion model trained on ERA5 reanalysis data, Appa can be conditioned on any type of observations to infer the posterior distribution of plausible state trajectories, without retraining. Our unified probabilistic framework flexibly tackles multiple inference tasks -- reanalysis, filtering, and forecasting -- using the same model, eliminating the need for task-specific architectures or training procedures. Experiments demonstrate physical consistency on a global scale and good reconstructions from observations, while showing competitive forecasting skills. Our results establish latent score-based data assimilation as a promising foundation for future global atmospheric modeling systems.