FLEX: A Backbone for Diffusion-Based Modeling of Spatio-temporal Physical Systems

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses modeling spatiotemporal physical systems, specifically high-resolution 2D turbulence. To tackle three core challenges—super-resolution reconstruction, short-term forecasting, and calibrated uncertainty estimation—we propose FLEX, a diffusion-model backbone architecture. FLEX operates in the residual velocity field space to stabilize training and reduce variance; integrates latent-space Transformers with convolutional ResNets via a redesigned skip-connection mechanism to jointly capture long-range dependencies and fine-grained local structures; and introduces a novel weak/strong dual-path conditional injection scheme enabling high-fidelity prediction and super-resolution in just two denoising steps. Experiments demonstrate that FLEX significantly outperforms existing baselines on high-resolution turbulent flow data, while generalizing robustly to unseen Reynolds numbers, physical quantities (e.g., velocity fields), and boundary conditions. To our knowledge, FLEX is the first generative physics model to simultaneously achieve high-fidelity forecasting, super-resolution, and calibrated uncertainty quantification.

Technology Category

Application Category

📝 Abstract

We introduce FLEX (FLow EXpert), a backbone architecture for generative modeling of spatio-temporal physical systems using diffusion models. FLEX operates in the residual space rather than on raw data, a modeling choice that we motivate theoretically, showing that it reduces the variance of the velocity field in the diffusion model, which helps stabilize training. FLEX integrates a latent Transformer into a U-Net with standard convolutional ResNet layers and incorporates a redesigned skip connection scheme. This hybrid design enables the model to capture both local spatial detail and long-range dependencies in latent space. To improve spatio-temporal conditioning, FLEX uses a task-specific encoder that processes auxiliary inputs such as coarse or past snapshots. Weak conditioning is applied to the shared encoder via skip connections to promote generalization, while strong conditioning is applied to the decoder through both skip and bottleneck features to ensure reconstruction fidelity. FLEX achieves accurate predictions for super-resolution and forecasting tasks using as few as two reverse diffusion steps. It also produces calibrated uncertainty estimates through sampling. Evaluations on high-resolution 2D turbulence data show that FLEX outperforms strong baselines and generalizes to out-of-distribution settings, including unseen Reynolds numbers, physical observables (e.g., fluid flow velocity fields), and boundary conditions.

Problem

Research questions and friction points this paper is trying to address.

Modeling spatio-temporal physical systems with diffusion models

Reducing velocity field variance to stabilize training

Improving spatio-temporal conditioning for accurate predictions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Residual space modeling reduces velocity field variance

Hybrid U-Net with latent Transformer captures dependencies

Task-specific encoder improves spatio-temporal conditioning

🔎 Similar Papers

A Survey on Diffusion Models for Time Series and Spatio-Temporal Data