🤖 AI Summary
Existing atmospheric data assimilation methods are predominantly data-driven, neglecting physical conservation laws and dynamical constraints, thereby yielding physically inconsistent reconstructions. To address this, we propose the Physics-Regularized Diffusion Model (PRDM), the first diffusion-based framework to explicitly embed partial differential equation (PDE)-governed physical laws into the generative process. PRDM introduces a physics-regularized objective function and incorporates a virtual reconstruction encoder to enable observation-guided, latent-space conditional generation. By jointly modeling sparse observational data and hard physical constraints, PRDM achieves state-of-the-art performance on the ERA5 dataset—significantly improving reconstruction accuracy while strictly enforcing mass conservation and dynamical consistency. The method thus delivers both high-fidelity reconstructions and strong physical interpretability, bridging the gap between deep generative modeling and first-principles atmospheric physics.
📝 Abstract
Data Assimilation (DA) plays a critical role in atmospheric science by reconstructing spatially continous estimates of the system state, which serves as initial conditions for scientific analysis. While recent advances in diffusion models have shown great potential for DA tasks, most existing approaches remain purely data-driven and often overlook the physical laws that govern complex atmospheric dynamics. As a result, they may yield physically inconsistent reconstructions that impair downstream applications. To overcome this limitation, we propose PhyDA, a physics-guided diffusion framework designed to ensure physical coherence in atmospheric data assimilation. PhyDA introduces two key components: (1) a Physically Regularized Diffusion Objective that integrates physical constraints into the training process by penalizing deviations from known physical laws expressed as partial differential equations, and (2) a Virtual Reconstruction Encoder that bridges observational sparsity for structured latent representations, further enhancing the model's ability to infer complete and physically coherent states. Experiments on the ERA5 reanalysis dataset demonstrate that PhyDA achieves superior accuracy and better physical plausibility compared to state-of-the-art baselines. Our results emphasize the importance of combining generative modeling with domain-specific physical knowledge and show that PhyDA offers a promising direction for improving real-world data assimilation systems.