Easy attention: A simple attention mechanism for temporal predictions with transformers

📅 2023-08-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient robustness of Transformers in chaotic time-series forecasting, this paper proposes Easy Attention—a lightweight attention mechanism that replaces the conventional query-key dot product and softmax operations with end-to-end learnable scalar attention scores. Theoretically, we establish for the first time that queries, keys, and softmax are not essential to self-attention, and reveal via singular value decomposition (SVD) that standard self-attention fundamentally performs low-rank information compression. Methodologically, we redesign the Transformer encoder by jointly optimizing temporal reconstruction and predictive modeling. Experiments on the Lorenz system, turbulent shear flow, and nuclear reactor dynamics demonstrate that our approach surpasses both standard Transformers and LSTMs in prediction accuracy, while reducing computational complexity, mitigating training instability, and significantly enhancing generalization and robustness.
📝 Abstract
To improve the robustness of transformer neural networks used for temporal-dynamics prediction of chaotic systems, we propose a novel attention mechanism called easy attention which we demonstrate in time-series reconstruction and prediction. While the standard self attention only makes use of the inner product of queries and keys, it is demonstrated that the keys, queries and softmax are not necessary for obtaining the attention score required to capture long-term dependencies in temporal sequences. Through the singular-value decomposition (SVD) on the softmax attention score, we further observe that self attention compresses the contributions from both queries and keys in the space spanned by the attention score. Therefore, our proposed easy-attention method directly treats the attention scores as learnable parameters. This approach produces excellent results when reconstructing and predicting the temporal dynamics of chaotic systems exhibiting more robustness and less complexity than self attention or the widely-used long short-term memory (LSTM) network. We show the improved performance of the easy-attention method in the Lorenz system, a turbulence shear flow and a model of a nuclear reactor.
Problem

Research questions and friction points this paper is trying to address.

Improves robustness of transformers for chaotic temporal predictions
Simplifies attention mechanism by removing queries, keys, softmax
Enhances performance in chaotic systems like Lorenz and turbulence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes easy attention as simple alternative
Uses learnable parameters for attention scores
Improves robustness in chaotic systems prediction
🔎 Similar Papers
No similar papers found.
M
Marcial Sanchis-Agudo
FLOW, Engineering Mechanics, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden.
Y
Yuning Wang
FLOW, Engineering Mechanics, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden.
R
Roger Arnau
Instituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València. Camino de Vera s/n, 46022 València, Spain.
L
L. Guastoni
FLOW, Engineering Mechanics, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden.
J
Jasmin Lim
Department of Aerospace Engineering, University of Michigan, Ann Arbor, USA.
Karthik Duraisamy
Karthik Duraisamy
University of Michigan
Computational ModelingMultiscale ModelingAI-Augmented ScienceTurbulence Modeling & Simulations
Ricardo Vinuesa
Ricardo Vinuesa
Associate Professor, University of Michigan
Artificial IntelligenceSimulationTurbulent boundary layersFlow controlSustainability