Transolver is a Linear Transformer: Revisiting Physics-Attention through the Lens of Linear Attention

📅 2025-11-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Standard attention mechanisms in Transformer-based neural operators incur quadratic computational complexity, hindering scalability for PDE solving. Method: This work identifies that the Physics-Attention in Transolver is a special case of linear attention and reveals that its performance gain stems from slicing and unslicing operations—not inter-slice interactions. Leveraging this insight, we propose a two-step transformation to unify it with standard linear attention, yielding LinearNO: a lightweight, efficient neural operator built upon a slicing-projection → linear-attention → unslicing-reconstruction paradigm within the Transformer framework. Contribution/Results: LinearNO achieves state-of-the-art performance on six canonical PDE benchmarks, reducing parameters by 40.0% and computational cost by 36.2% on average. It also significantly outperforms existing methods on industrial datasets—AirfRANS and Shape-Net Car—demonstrating strong generalization and practical efficacy.

Technology Category

Application Category

📝 Abstract
Recent advances in Transformer-based Neural Operators have enabled significant progress in data-driven solvers for Partial Differential Equations (PDEs). Most current research has focused on reducing the quadratic complexity of attention to address the resulting low training and inference efficiency. Among these works, Transolver stands out as a representative method that introduces Physics-Attention to reduce computational costs. Physics-Attention projects grid points into slices for slice attention, then maps them back through deslicing. However, we observe that Physics-Attention can be reformulated as a special case of linear attention, and that the slice attention may even hurt the model performance. Based on these observations, we argue that its effectiveness primarily arises from the slice and deslice operations rather than interactions between slices. Building on this insight, we propose a two-step transformation to redesign Physics-Attention into a canonical linear attention, which we call Linear Attention Neural Operator (LinearNO). Our method achieves state-of-the-art performance on six standard PDE benchmarks, while reducing the number of parameters by an average of 40.0% and computational cost by 36.2%. Additionally, it delivers superior performance on two challenging, industrial-level datasets: AirfRANS and Shape-Net Car.
Problem

Research questions and friction points this paper is trying to address.

Reformulating Physics-Attention as linear attention to improve efficiency
Reducing computational costs and parameters in PDE neural operators
Enhancing model performance on standard and industrial PDE datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulates Physics-Attention as linear attention
Reduces parameters and computational cost significantly
Achieves state-of-the-art performance on PDE benchmarks
🔎 Similar Papers
No similar papers found.
W
Wenjie Hu
National Key Laboratory of Parallel and Distributed Computing, College of Computer Science and Technology, National University of Defense Technology, Changsha, China
Sidun Liu
Sidun Liu
National University of Defense Technology
Machine Learning
Peng Qiao
Peng Qiao
National University of Defense Technology
image processingcomputer visionmachine learningdeep learning
Z
Zhenglun Sun
National Key Laboratory of Parallel and Distributed Computing, College of Computer Science and Technology, National University of Defense Technology, Changsha, China
Y
Yong Dou
National Key Laboratory of Parallel and Distributed Computing, College of Computer Science and Technology, National University of Defense Technology, Changsha, China