🤖 AI Summary
This work addresses the memory bottleneck in Transformers caused by linearly growing KV caches and the sharp decline in recall accuracy of state space models (SSMs) over long-range dependencies. To overcome these limitations, the authors propose Echo, a novel architecture that introduces the spectral Koopman operator into sequence modeling for the first time. Echo employs Spectral Koopman Attention (SKA), which enables exact long-range associative recall with constant memory usage and without relying on KV caches. By integrating kernel ridge regression to fit spectral linear systems, a power iteration filter, and O(r²) streaming state compression, Echo achieves 100% recall accuracy on six benchmarks—including Multi-Query Associative Recall—with a 50M-parameter model. It substantially outperforms both pure SSMs and hybrid SSM+Attention models while maintaining constant inference memory.
📝 Abstract
Long chain-of-thought reasoning and agentic tool-calling produce traces spanning tens of thousands of tokens, yet Transformer KV caches grow linearly with sequence length, creating a memory bottleneck on commodity hardware. State-space models offer constant-memory recurrence but suffer a memory cliff: retrieval accuracy collapses once the gap between a stored fact and its query exceeds the effective horizon of the recurrent state. We introduce Echo, a KV-cache-free associative recall architecture built around Spectral Koopman Attention (SKA); a drop-in replacement for attention layers that augments SSM blocks with a closed-form dynamical operator whose sufficient statistics are accumulated in constant memory with no KV cache. Echo fits a spectral linear system to the key and value history via kernel ridge regression and retrieves through a learned power-iterated filter, all from $O(r^{2})$ streaming state where $r$ is a small projection rank. On the Multi-Query Associative Recall benchmark, a pure Mamba-2 SSM fails to exceed chance accuracy (${\sim}3\%$) across all gap lengths and KV-pair counts, while at the 50M parameter scale SKA-augmented models achieve $100\%$ retrieval accuracy on every configuration tested, including distractor gaps of $4{,}096$ tokens with $32$ KV pairs. Across five additional transfer benchmarks including needle-in-a-haystack, tool-trace, and multi-hop retrieval, SKA consistently outperforms both pure SSM and SSM+Attention hybrids while maintaining constant inference memory. Ablations confirm that the spectral operator, not the prefix masking strategy, drives the retrieval gain.