🤖 AI Summary
This work addresses the challenge of modeling high-dimensional physical dynamical systems, which demands both short-term prediction accuracy and fidelity to long-term statistical structures—a balance often compromised in turbulent flows by existing methods due to distributional drift or inefficient inference. The authors propose MeLISA, a novel single-step generative autoregressive surrogate that, for the first time, operates without latent-variable encoders or iterative denoising. Built upon the MeanFlow framework, MeLISA integrates block-wise stochastic transition kernels with window and temporal incremental consistency losses to stabilize long-horizon rollouts. Supporting both UNet and DiT backbones, the model achieves superior performance on 256×256 Kolmogorov flow and 192×192 turbulent channel data, outperforming neural operator baselines in both short-term forecasts and long-term statistics—such as energy spectra and turbulent kinetic energy—while maintaining comparable or faster inference speeds, high parameter efficiency, and scalability up to 150 million parameters.
📝 Abstract
Fast surrogate modeling for high-dimensional physical dynamics requires more than low short-term error: useful models must roll out efficiently while preserving the statistical structure of long trajectories. Neural operators provide inexpensive autoregressive forecasts but can drift in turbulent regimes, whereas rolling diffusion and latent generative surrogates can represent stochastic transitions at the cost of multi-step denoising, noise-schedule design, or auxiliary compression models. We propose MeanFlow Long-term Invariant Spatiotemporal Consistency Autoregressive Models (MeLISA), a latent-free autoregressive generative surrogate built on pixel-space MeanFlow. MeLISA defines a blockwise stochastic transition kernel that generates each forecast block with a single model evaluation, avoiding latent encoders and iterative diffusion solvers at inference time. To stabilize long-horizon rollouts, MeLISA combines a Window-Consistency MeanFlow objective that learns conditional spatiotemporal generation from partially observed temporal windows with a Time Increment Consistency loss that constrains multi-lag finite increments and targets temporal-correlation structure. We evaluate MeLISA with compact UNet and scalable DiT backbones on two high-resolution benchmarks, extended 2D Kolmogorov flow at $256 \times 256$ and turbulent channel-flow slice at $192 \times 192$. MeLISA outperforms neural-operator baselines on short-term forecasting accuracy and long-horizon statistical metrics, including energy spectra, turbulent kinetic energy, and mixing-rate-related dynamics, while achieving inference speeds comparable to, and in some cases faster than, neural operators. Compact 3.7-5.7M-parameter variants already deliver strong parameter efficiency, and DiT variants provide a scalable path up to 150M parameters. Overall, MeLISA benefits both rollout efficiency and long-horizon statistical accuracy.