๐ค AI Summary
This work addresses the instability in training and inference inherent to long-term autoregressive modeling of chaotic dynamical systems by proposing a structure-preserving approach that integrates a shooting-based hybrid finite element framework with an autoregressive Transformer. The method leverages a Vision Transformer to extract structure-preserving latent dynamics and incorporates a provably stable hybrid neural time integrator, which enforces discrete energy conservation and guarantees uniformly bounded gradients, thereby effectively mitigating gradient explosion. Evaluated on a โmini-foundationโ fusion component model trained on only 12 simulations, the proposed approach achieves a 9,000ร speedup and surpasses existing foundation models with 65ร fewer parameters, significantly enhancing both long-term prediction stability and computational efficiency.
๐ Abstract
For autoregressive modeling of chaotic dynamical systems over long time horizons, the stability of both training and inference is a major challenge in building scientific foundation models. We present a hybrid technique in which an autoregressive transformer is embedded within a novel shooting-based mixed finite element scheme, exposing topological structure that enables provable stability. For forward problems, we prove preservation of discrete energies, while for training we prove uniform bounds on gradients, provably avoiding the exploding gradient problem. Combined with a vision transformer, this yields latent tokens admitting structure-preserving dynamics. We outperform modern foundation models with a $65\times$ reduction in model parameters and long-horizon forecasting of chaotic systems. A "mini-foundation" model of a fusion component shows that 12 simulations suffice to train a real-time surrogate, achieving a $9{,}000\times$ speedup over particle-in-cell simulation.