Linear Attention for Efficient Bidirectional Sequence Modeling

📅 2025-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge that linear attention mechanisms struggle to support bidirectional sequence modeling. To resolve this, we propose LION—a novel framework that establishes the first rigorous theoretical foundation for bidirectional linear attention, proving its equivalence to bidirectional RNNs while preserving parallelizable training and linear-time inference. Our key contributions are threefold: (1) the first exact equivalence formulation of linear attention in the bidirectional setting; (2) three new variants—LION-LIT, LION-D, and LION-S—incorporating a selective masking mechanism inspired by state space models (SSMs); and (3) native compatibility with RetNet architecture. Experiments demonstrate that LION achieves performance on par with standard Transformers and SSMs on bidirectional tasks, while significantly accelerating training. The implementation is publicly released.

Technology Category

Application Category

📝 Abstract
Transformers with linear attention enable fast and parallel training. Moreover, they can be formulated as Recurrent Neural Networks (RNNs), for efficient linear-time inference. While extensively evaluated in causal sequence modeling, they have yet to be extended to the bidirectional setting. This work introduces the LION framework, establishing new theoretical foundations for linear transformers in bidirectional sequence modeling. LION constructs a bidirectional RNN equivalent to full Linear Attention. This extends the benefits of linear transformers: parallel training, and efficient inference, into the bidirectional setting. Using LION, we cast three linear transformers to their bidirectional form: LION-LIT, the bidirectional variant corresponding to (Katharopoulos et al., 2020); LION-D, extending RetNet (Sun et al., 2023); and LION-S, a linear transformer with a stable selective mask inspired by selectivity of SSMs (Dao&Gu, 2024). Replacing the attention block with LION (-LIT, -D, -S) achieves performance on bidirectional tasks that approaches that of Transformers and State-Space Models (SSMs), while delivering significant improvements in training speed. Our implementation is available in http://github.com/LIONS-EPFL/LION.
Problem

Research questions and friction points this paper is trying to address.

Extends linear transformers to bidirectional sequence modeling
Enables parallel training and efficient inference bidirectionally
Introduces LION framework for theoretical and practical advancements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear Attention for efficiency
Bidirectional RNN equivalence
LION framework implementation
🔎 Similar Papers
No similar papers found.
A
Arshia Afzal
LIONS, EPFL, Switzerland; Integrated Neurotechnologies Laboratory, EPFL, Switzerland
Elias Abad Rocamora
Elias Abad Rocamora
EPFL
Deep LearningRobustness VerificationAdversarial Robustness in NLP
L
Leyla Naz Candogan
LIONS, EPFL, Switzerland
Pol Puigdemont
Pol Puigdemont
EPFL
Machine Learning
F
Francesco Tonin
LIONS, EPFL, Switzerland
Yongtao Wu
Yongtao Wu
epfl
Trustworthy machine learningOptimization
Mahsa Shoaran
Mahsa Shoaran
Associate Professor, EPFL, Switzerland
Neural InterfacingBiomedical CircuitsMachine Learning HardwareNeuroengineering
V
V. Cevher
LIONS, EPFL, Switzerland