VICON: A Foundation Model for Multi-Physics Fluid Dynamics via Vision In-Context Operator Networks

📅 2024-11-25

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address the low computational efficiency, poor scalability, and inability to model two-dimensional functions of In-Context Operator Networks on high-dimensional dense data, this work pioneers the integration of Vision Transformers into the in-context operator learning framework. We propose a multi-physics fluid modeling approach based on patch-wise function representation, enabling dynamic context construction, flexible handling of variable time steps and sparse-frame inputs, and enhanced generalization via a multi-physics pretraining paradigm. Evaluated on two benchmark datasets for compressible flow, our method reduces normalized L² error by 40% and 61.6%, respectively, and achieves inference speed three times faster than the state-of-the-art MPP model. Moreover, it significantly improves long-horizon roll-out prediction efficiency.

Technology Category

Application Category

📝 Abstract

In-Context Operator Networks (ICONs) are models that learn operators across different types of PDEs using a few-shot, in-context approach. Although they show successful generalization to various PDEs, existing methods treat each data point as a single token, and suffer from computational inefficiency when processing dense data, limiting their application in higher spatial dimensions. In this work, we propose extit{Vision In-Context Operator Networks} (VICON), incorporating a vision transformer architecture that efficiently processes 2D functions through patch-wise operations. We evaluated our method on three fluid dynamics datasets, demonstrating both superior performance (reducing the rescaled $L^2$ error by $40%$ and $61.6%$ for two benchmark datasets for compressible flows, respectively) and computational efficiency (requiring only one-third of the inference time per frame) in long-term rollout predictions compared to the current state-of-the-art sequence-to-sequence model with fixed timestep prediction: Multiple Physics Pretraining (MPP). Compared to MPP, our method preserves the benefits of in-context operator learning, enabling flexible context formation when dealing with insufficient frame counts or varying timestep values.

Problem

Research questions and friction points this paper is trying to address.

Efficiently process dense data

Improve long-term rollout predictions

Enable flexible context formation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision transformer enhances 2D processing

Patch-wise operations improve computational efficiency

In-context learning for flexible PDE solutions

🔎 Similar Papers

CViT: Continuous Vision Transformer for Operator Learning