Towards Compressive and Scalable Recurrent Memory

πŸ“… 2026-02-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes Elastic Memory, a novel architecture addressing the quadratic complexity bottleneck of Transformers in long-context processing and the trade-off between theoretical rigor and scalability in existing recurrent memory methods. By modeling historical sequences as continuous signals and leveraging the HiPPO framework, the approach enables online optimal compression into a fixed-size memory state. A reconstructible polynomial sampling mechanism is introduced to flexibly recover historical summaries at test time. The key innovation lies in decoupling theoretically optimal compression from inductive biases during inference, thereby achieving both efficiency and adaptability. Experiments demonstrate that on tasks with 32k+ context lengths, Elastic Memory reduces memory usage by 16Γ— compared to Memorizing Transformer at equal parameter counts and outperforms Melodiβ€”a model with 30% more parameters. Moreover, even when scaled up by 4Γ—, it maintains superior performance with faster inference.

Technology Category

Application Category

πŸ“ Abstract
Transformers face a quadratic bottleneck in attention when scaling to long contexts. Recent approaches introduce recurrent memory to extend context beyond the current window, yet these often face a fundamental trade-off between theoretical principles and practical scalability. To address this, we introduce Elastic Memory, a novel memory architecture grounded in the HiPPO framework for online function approximation. Elastic Memory treats historical sequence as samples from continuous signals, applying optimal online compression to encode them into a fixed-size memory state. For retrieval, we propose a flexible \textit{polynomial sampling} mechanism that reconstructs a history summary from this compressed state. Elastic Memory consistently outperformed baselines on long-context (32k+) datasets across three domains. With equal parameters, it beat Memorizing Transformer by 16x memory and outperformed Melodi at all memory sizes, even when Melodi had 30% more parameters. When scaling model size, Elastic Memory stayed ahead of all baselines and was significantly faster than Melodi at 4x size. Furthermore, its decoupled design allows for injecting inductive biases at test-time to boost performance.
Problem

Research questions and friction points this paper is trying to address.

long-context
attention bottleneck
recurrent memory
scalability
Transformer
Innovation

Methods, ideas, or system contributions that make the work stand out.

Elastic Memory
HiPPO
online compression
polynomial sampling
recurrent memory
Yunchong Song
Yunchong Song
Ph.D. student, Shanghai Jiao Tong University
Machine Learning
Jushi Kai
Jushi Kai
Shanghai Jiao Tong University
Language ModelingLLMLong Context
L
Liming Lu
LUMIA Lab, School of Artificial Intelligence, Shanghai Jiao Tong University
K
Kaixi Qiu
LUMIA Lab, School of Artificial Intelligence, Shanghai Jiao Tong University
Z
Zhouhan Lin
LUMIA Lab, School of Artificial Intelligence, Shanghai Jiao Tong University; Shanghai Artificial Intelligence Laboratory; Shanghai Innovation Institute