🤖 AI Summary
This work addresses the limitations of existing latent diffusion models in high-resolution ensemble weather forecasting, which suffer from the absence of a universal foundation model and explicit semantic structure in meteorological fields, leading to inconsistent spectral regularization across multivariate data. To overcome this, the authors propose a joint conditional framework integrating Variable-Aware Masked Frequency Modeling (VA-MFM) with a 3D Masked Autoencoder (3D-MAE), which captures evolving weather state features and adaptively modulates spectral regularization strength per variable, thereby significantly enhancing diffusibility in the latent space. The method outperforms the ECMWF ENS system in short-range forecasts, matches its skill in long-range predictions, and enables rapid generation of 15-day global forecasts at 6-hour intervals within five minutes on a single NVIDIA H200 GPU, facilitating highly efficient parallel ensemble production.
📝 Abstract
Latent diffusion models (LDMs) suffer from limited diffusability in high-resolution (<=0.25{\deg}) ensemble weather forecasting, where diffusability characterizes how easily a latent data distribution can be modeled by a diffusion process. Unlike natural image fields, meteorological fields lack task-agnostic foundation models and explicit semantic structures, making VFM-based regularization inapplicable. Moreover, existing frequency-based approaches impose identical spectral regularization across channels under a homogeneity assumption, which leads to uneven regularization strength under the inter-variable spectral heterogeneity in multivariate meteorological data. To address these challenges, we propose a 3D Masked AutoEncoder (3D-MAE) that encodes weather-state evolution features as an additional conditioning for the diffusion model, together with a Variable-Aware Masked Frequency Modeling (VA-MFM) strategy that adaptively selects thresholds based on the spectral energy distribution of each variable. Together, we propose PuYun-LDM, which enhances latent diffusability and achieves superior performance to ENS at short lead times while remaining comparable to ENS at longer horizons. PuYun-LDM generates a 15-day global forecast with a 6-hour temporal resolution in five minutes on a single NVIDIA H200 GPU, while ensemble forecasts can be efficiently produced in parallel.