🤖 AI Summary
Machine learning models for geophysical systems often achieve high short-term prediction accuracy but fail to preserve long-term statistical properties—such as marginal distributions of state variables—and struggle to maintain dynamical consistency under sparse-data regimes.
Method: This paper proposes a distribution-aware machine learning framework that incorporates known marginal distributions as physical priors. It enforces non-local physics-informed consistency via the kernelized Stein discrepancy (KSD) in a reproducing kernel Hilbert space, coupled with a normalization calibration mechanism to jointly optimize point-wise predictions and long-term attractor fidelity.
Results: Evaluated on offline CO₂ flux inversion and online quasi-geostrophic flow simulation, the method significantly improves short-term forecasting accuracy while rigorously preserving long-term marginal distributions—resolving the critical trade-off between “accurate prediction” and “statistical bias” under data sparsity.
📝 Abstract
Machine learning (ML) has shown significant promise in studying complex geophysical dynamical systems, including turbulence and climate processes. Such systems often display sensitive dependence on initial conditions, reflected in positive Lyapunov exponents, where even small perturbations in short-term forecasts can lead to large deviations in long-term outcomes. Thus, meaningful inference requires not only accurate short-term predictions, but also consistency with the system's long-term attractor that is captured by the marginal distribution of state variables. Existing approaches attempt to address this challenge by incorporating spatial and temporal dependence, but these strategies become impractical when data are extremely sparse. In this work, we show that prior knowledge of marginal distributions offers valuable complementary information to short-term observations, motivating a distribution-informed learning framework. We introduce a calibration algorithm based on normalization and the Kernelized Stein Discrepancy (KSD) to enhance ML predictions. The method here employs KSD within a reproducing kernel Hilbert space to calibrate model outputs, improving their fidelity to known physical distributions. This not only sharpens pointwise predictions but also enforces consistency with non-local statistical structures rooted in physical principles. Through synthetic experiments-spanning offline climatological CO2 fluxes and online quasi-geostrophic flow simulations-we demonstrate the robustness and broad utility of the proposed framework.