Towards Compressive and Scalable Recurrent Memory

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work proposes Elastic Memory, a novel architecture addressing the quadratic complexity bottleneck of Transformers in long-context processing and the trade-off between theoretical rigor and scalability in existing recurrent memory methods. By modeling historical sequences as continuous signals and leveraging the HiPPO framework, the approach enables online optimal compression into a fixed-size memory state. A reconstructible polynomial sampling mechanism is introduced to flexibly recover historical summaries at test time. The key innovation lies in decoupling theoretically optimal compression from inductive biases during inference, thereby achieving both efficiency and adaptability. Experiments demonstrate that on tasks with 32k+ context lengths, Elastic Memory reduces memory usage by 16× compared to Memorizing Transformer at equal parameter counts and outperforms Melodi—a model with 30% more parameters. Moreover, even when scaled up by 4×, it maintains superior performance with faster inference.

Technology Category

Application Category

📝 Abstract

Transformers face a quadratic bottleneck in attention when scaling to long contexts. Recent approaches introduce recurrent memory to extend context beyond the current window, yet these often face a fundamental trade-off between theoretical principles and practical scalability. To address this, we introduce Elastic Memory, a novel memory architecture grounded in the HiPPO framework for online function approximation. Elastic Memory treats historical sequence as samples from continuous signals, applying optimal online compression to encode them into a fixed-size memory state. For retrieval, we propose a flexible \textit{polynomial sampling} mechanism that reconstructs a history summary from this compressed state. Elastic Memory consistently outperformed baselines on long-context (32k+) datasets across three domains. With equal parameters, it beat Memorizing Transformer by 16x memory and outperformed Melodi at all memory sizes, even when Melodi had 30% more parameters. When scaling model size, Elastic Memory stayed ahead of all baselines and was significantly faster than Melodi at 4x size. Furthermore, its decoupled design allows for injecting inductive biases at test-time to boost performance.

Problem

Research questions and friction points this paper is trying to address.

long-context

attention bottleneck

recurrent memory

scalability

Transformer

Innovation

Methods, ideas, or system contributions that make the work stand out.

Elastic Memory

HiPPO

online compression