Improved Operator Learning by Orthogonal Attention

📅 2023-10-19
🏛️ International Conference on Machine Learning
📈 Citations: 5
Influential: 3
📄 PDF
🤖 AI Summary
Neural operators suffer from overfitting and poor generalization in few-shot PDE solving due to parameter redundancy in standard attention mechanisms. Method: We propose Orthogonal Attention, the first attention design that integrates spectral decomposition of kernel integral operators with neural approximation of orthogonal eigenfunctions, thereby enforcing implicit regularization via intrinsic orthogonality. This eliminates high-dimensional parameter coupling inherent in softmax-based attention, substantially reducing model complexity. Results: Evaluated on six benchmark PDE tasks spanning regular and irregular geometries, our method achieves an average 12.6% higher accuracy than state-of-the-art neural operators—including FNO, TFNO, and GNO—while retaining over 92% of its performance under a 50% reduction in training data. It thus demonstrates superior robustness and data efficiency.
📝 Abstract
Neural operators, as an efficient surrogate model for learning the solutions of PDEs, have received extensive attention in the field of scientific machine learning. Among them, attention-based neural operators have become one of the mainstreams in related research. However, existing approaches overfit the limited training data due to the considerable number of parameters in the attention mechanism. To address this, we develop an orthogonal attention based on the eigendecomposition of the kernel integral operator and the neural approximation of eigenfunctions. The orthogonalization naturally poses a proper regularization effect on the resulting neural operator, which aids in resisting overfitting and boosting generalization. Experiments on six standard neural operator benchmark datasets comprising both regular and irregular geometries show that our method can outperform competing baselines with decent margins.
Problem

Research questions and friction points this paper is trying to address.

Attention Mechanism
Overfitting
Neural Operator Model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal Attention
Feature Decomposition
Generalization Improvement
🔎 Similar Papers
No similar papers found.
Zipeng Xiao
Zipeng Xiao
Shanghai Jiao Tong University
Deep learning
Zhongkai Hao
Zhongkai Hao
Tsinghua University
machine learningAI for Sciencephysics-informed machine learning
B
Bokai Lin
Qing Yuan Research Institute, SEIEE, Shanghai Jiao Tong University
Z
Zhijie Deng
Qing Yuan Research Institute, SEIEE, Shanghai Jiao Tong University
H
Hang Su
Dept. of Comp. Sci. & Tech., Tsinghua University