🤖 AI Summary
This work addresses the computational and memory bottlenecks of Transformers in long-sequence modeling, which stem from the quadratic complexity of self-attention. To overcome this limitation, the authors propose RMAAT, a novel architecture inspired by astrocytic mechanisms in the brain. RMAAT integrates segment-wise recurrent processing, adaptive memory compression grounded in astrocyte-mediated short- and long-term plasticity, linear-complexity attention, and a custom backpropagation algorithm—astrocytic memory replay—that is tailored for recurrent structures. For the first time, astrocytic plasticity is formalized into an efficient memory and attention module. Evaluated on the Long Range Arena benchmark, RMAAT achieves competitive accuracy while substantially improving computational and memory efficiency compared to existing approaches.
📝 Abstract
The quadratic complexity of self-attention mechanism presents a significant impediment to applying Transformer models to long sequences. This work explores computational principles derived from astrocytes-glial cells critical for biological memory and synaptic modulation-as a complementary approach to conventional architectural modifications for efficient self-attention. We introduce the Recurrent Memory Augmented Astromorphic Transformer (RMAAT), an architecture integrating abstracted astrocyte functionalities. RMAAT employs a recurrent, segment-based processing strategy where persistent memory tokens propagate contextual information. An adaptive compression mechanism, governed by a novel retention factor derived from simulated astrocyte long-term plasticity (LTP), modulates these tokens. Attention within segments utilizes an efficient, linear-complexity mechanism inspired by astrocyte short-term plasticity (STP). Training is performed using Astrocytic Memory Replay Backpropagation (AMRB), a novel algorithm designed for memory efficiency in recurrent networks. Evaluations on the Long Range Arena (LRA) benchmark demonstrate RMAAT's competitive accuracy and substantial improvements in computational and memory efficiency, indicating the potential of incorporating astrocyte-inspired dynamics into scalable sequence models.