CViT: Continuous Vision Transformer for Operator Learning

📅 2024-05-22
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses operator learning in scientific machine learning, aiming to construct continuous neural operators that generalize across arbitrary resolutions and accurately model solution mappings of partial differential equations (PDEs). To overcome limitations of conventional methods in approximating infinite-dimensional function spaces, we introduce Vision Transformers into the operator learning framework for the first time. Our approach features grid-coordinate-based continuous positional encoding and query-driven cross-attention, enabling mesh-free, multi-scale, and resolution-consistent inference. The architecture employs a Vision Transformer encoder coupled with explicit functional-space parameterization, requiring neither pretraining nor rollout fine-tuning. We achieve state-of-the-art performance on diverse PDE benchmarks—including fluid dynamics, climate modeling, and reaction-diffusion equations—outperforming significantly larger foundation models in accuracy and generalization.

Technology Category

Application Category

📝 Abstract
Operator learning, which aims to approximate maps between infinite-dimensional function spaces, is an important area in scientific machine learning with applications across various physical domains. Here we introduce the Continuous Vision Transformer (CViT), a novel neural operator architecture that leverages advances in computer vision to address challenges in learning complex physical systems. CViT combines a vision transformer encoder, a novel grid-based coordinate embedding, and a query-wise cross-attention mechanism to effectively capture multi-scale dependencies. This design allows for flexible output representations and consistent evaluation at arbitrary resolutions. We demonstrate CViT's effectiveness across a diverse range of partial differential equation (PDE) systems, including fluid dynamics, climate modeling, and reaction-diffusion processes. Our comprehensive experiments show that CViT achieves state-of-the-art performance on multiple benchmarks, often surpassing larger foundation models, even without extensive pretraining and roll-out fine-tuning. Taken together, CViT exhibits robust handling of discontinuous solutions, multi-scale features, and intricate spatio-temporal dynamics. Our contributions can be viewed as a significant step towards adapting advanced computer vision architectures for building more flexible and accurate machine learning models in the physical sciences.
Problem

Research questions and friction points this paper is trying to address.

Approximates maps between function spaces
Learns complex physical systems effectively
Handles multi-scale spatio-temporal dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision transformer encoder
Grid-based coordinate embedding
Query-wise cross-attention mechanism
🔎 Similar Papers
No similar papers found.
Sifan Wang
Sifan Wang
Postdoctoral fellow, Yale University
Scientific Machine LearningAI for ScienceMachine LearningDeep Learning
J
Jacob H. Seidman
Graduate Program in Applied Mathematics and Computational Science, University of Pennsylvania
S
Shyam Sankaran
Department of Mechanical Engineering and Applied Mechanics, University of Pennsylvania
Hanwen Wang
Hanwen Wang
Johns Hopkins University, SOM
Quantitative Systems PharmacologyOncologySystems Biology
G
George J. Pappas
Department of Electrical and Systems Engineering, University of Pennsylvania
P
P. Perdikaris
Department of Mechanical Engineering and Applied Mechanics, University of Pennsylvania