NESTOR: A Nested MOE-based Neural Operator for Large-Scale PDE Pre-Training

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a nested mixture-of-experts (MoE) neural operator to address the limitations of existing architectures, which struggle to effectively model the heterogeneity and complex dependencies inherent in partial differential equation (PDE) systems due to their reliance on a single, fixed structure. The proposed approach introduces a novel two-level MoE framework—operating at both the image and token levels—to enable input-adaptive expert activation: the image-level MoE captures global dependencies, while the token-level MoE focuses on local dynamics. Trained jointly across twelve heterogeneous PDE datasets, the model demonstrates significantly enhanced generalization, computational efficiency, and cross-task transfer performance in multi-task settings, thereby advancing the scalability and effectiveness of large-scale pretraining for neural operators.

Technology Category

Application Category

📝 Abstract
Neural operators have emerged as an efficient paradigm for solving PDEs, overcoming the limitations of traditional numerical methods and significantly improving computational efficiency. However, due to the diversity and complexity of PDE systems, existing neural operators typically rely on a single network architecture, which limits their capacity to fully capture heterogeneous features and complex system dependencies. This constraint poses a bottleneck for large-scale PDE pre-training based on neural operators. To address these challenges, we propose a large-scale PDE pre-trained neural operator based on a nested Mixture-of-Experts (MoE) framework. In particular, the image-level MoE is designed to capture global dependencies, while the token-level Sub-MoE focuses on local dependencies. Our model can selectively activate the most suitable expert networks for a given input, thereby enhancing generalization and transferability. We conduct large-scale pre-training on twelve PDE datasets from diverse sources and successfully transfer the model to downstream tasks. Extensive experiments demonstrate the effectiveness of our approach.
Problem

Research questions and friction points this paper is trying to address.

neural operators
PDE pre-training
Mixture-of-Experts
heterogeneous features
system dependencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Operator
Mixture-of-Experts
Large-Scale Pre-training
Partial Differential Equations
Nested MoE
Dengdi Sun
Dengdi Sun
Anhui University
Machine LearningComputer Vision
X
Xiaoya Zhou
School of Artificial Intelligence, Anhui University, Hefei, China
X
Xiao Wang
School of Computer Science and Technology, Anhui University, Hefei, China
H
Hao Si
School of Computer Science and Technology, Anhui University, Hefei, China
W
Wanli Lyu
School of Computer Science and Technology, Anhui University, Hefei, China
Jin Tang
Jin Tang
Anhui University
Computer visionintelligent video analysis
Bin Luo
Bin Luo
Anhui University, University of York
Pattern recognitionDigital image processing