🤖 AI Summary
This work addresses the trajectory drift in autoregressive neural simulators during long-horizon rollouts, which stems from transient error amplification caused by non-normal and non-commuting Jacobian matrices. To mitigate this, the study introduces a novel commutator-based regularization that explicitly penalizes deviations from Jacobian normality and inter-step non-commutativity directly within the loss function, without incurring additional inference overhead. The method efficiently estimates the regularization terms via Jacobian-vector products and incorporates propagator error bound analysis to ensure stability. Evaluated on both synthetic and real-world spatiotemporal data in 1D and 2D settings, the approach enables stable rollouts over thousands of steps and significantly enhances the out-of-distribution generalization of FourCastNet on ERA5 climate prediction tasks.
📝 Abstract
Autoregressive neural simulators now match classical solvers on short-horizon prediction of physical systems, yet their accuracy degrades rapidly when rolled out over long horizons. In this work, we identify transient amplification of perturbations around rollout trajectories as a structural mechanism driving rollout error. Using a linearization analysis we show that when the Jacobians along an autoregressive trajectory are non-normal and non-commuting, the model amplifies errors transiently, resulting in model rollout drift even when the overall system is asymptotically stable. Building on the analysis, we propose commutativity regularization: a combination of two penalties designed to reduce the normality defect of individual Jacobians and the commutator norm of Jacobians across steps. The penalties are estimated with Jacobian-vector products and have no inference-time cost. We show a propagator bound that quantifies rollout error under approximate commutativity and normality. We evaluate UNet and FNO variants with commutativity regularization on 1D and 2D spatio-temporal data in synthetic and real settings, showing successful long-horizon rollouts over thousands of steps. Further, we show that the method improves FourCastNet climate forecasts on ERA5 without using any new data. The gain is most pronounced out-of-distribution: trained on trajectories of a few hundred steps, regularized models remain in-distribution for thousands of rollout steps on initial conditions where baselines diverge.