CViT: Continuous Vision Transformer for Operator Learning

📅 2024-05-22

📈 Citations: 2

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses operator learning in scientific machine learning, aiming to construct continuous neural operators that generalize across arbitrary resolutions and accurately model solution mappings of partial differential equations (PDEs). To overcome limitations of conventional methods in approximating infinite-dimensional function spaces, we introduce Vision Transformers into the operator learning framework for the first time. Our approach features grid-coordinate-based continuous positional encoding and query-driven cross-attention, enabling mesh-free, multi-scale, and resolution-consistent inference. The architecture employs a Vision Transformer encoder coupled with explicit functional-space parameterization, requiring neither pretraining nor rollout fine-tuning. We achieve state-of-the-art performance on diverse PDE benchmarks—including fluid dynamics, climate modeling, and reaction-diffusion equations—outperforming significantly larger foundation models in accuracy and generalization.

Technology Category

Application Category

📝 Abstract

Operator learning, which aims to approximate maps between infinite-dimensional function spaces, is an important area in scientific machine learning with applications across various physical domains. Here we introduce the Continuous Vision Transformer (CViT), a novel neural operator architecture that leverages advances in computer vision to address challenges in learning complex physical systems. CViT combines a vision transformer encoder, a novel grid-based coordinate embedding, and a query-wise cross-attention mechanism to effectively capture multi-scale dependencies. This design allows for flexible output representations and consistent evaluation at arbitrary resolutions. We demonstrate CViT's effectiveness across a diverse range of partial differential equation (PDE) systems, including fluid dynamics, climate modeling, and reaction-diffusion processes. Our comprehensive experiments show that CViT achieves state-of-the-art performance on multiple benchmarks, often surpassing larger foundation models, even without extensive pretraining and roll-out fine-tuning. Taken together, CViT exhibits robust handling of discontinuous solutions, multi-scale features, and intricate spatio-temporal dynamics. Our contributions can be viewed as a significant step towards adapting advanced computer vision architectures for building more flexible and accurate machine learning models in the physical sciences.

Problem

Research questions and friction points this paper is trying to address.

Approximates maps between function spaces

Learns complex physical systems effectively

Handles multi-scale spatio-temporal dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision transformer encoder

Grid-based coordinate embedding

Query-wise cross-attention mechanism

🔎 Similar Papers

No similar papers found.