Echo: KV-Cache-Free Associative Recall with Spectral Koopman Operators

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the memory bottleneck in Transformers caused by linearly growing KV caches and the sharp decline in recall accuracy of state space models (SSMs) over long-range dependencies. To overcome these limitations, the authors propose Echo, a novel architecture that introduces the spectral Koopman operator into sequence modeling for the first time. Echo employs Spectral Koopman Attention (SKA), which enables exact long-range associative recall with constant memory usage and without relying on KV caches. By integrating kernel ridge regression to fit spectral linear systems, a power iteration filter, and O(r²) streaming state compression, Echo achieves 100% recall accuracy on six benchmarks—including Multi-Query Associative Recall—with a 50M-parameter model. It substantially outperforms both pure SSMs and hybrid SSM+Attention models while maintaining constant inference memory.

📝 Abstract

Long chain-of-thought reasoning and agentic tool-calling produce traces spanning tens of thousands of tokens, yet Transformer KV caches grow linearly with sequence length, creating a memory bottleneck on commodity hardware. State-space models offer constant-memory recurrence but suffer a memory cliff: retrieval accuracy collapses once the gap between a stored fact and its query exceeds the effective horizon of the recurrent state. We introduce Echo, a KV-cache-free associative recall architecture built around Spectral Koopman Attention (SKA); a drop-in replacement for attention layers that augments SSM blocks with a closed-form dynamical operator whose sufficient statistics are accumulated in constant memory with no KV cache. Echo fits a spectral linear system to the key and value history via kernel ridge regression and retrieves through a learned power-iterated filter, all from $O(r^{2})$ streaming state where $r$ is a small projection rank. On the Multi-Query Associative Recall benchmark, a pure Mamba-2 SSM fails to exceed chance accuracy (${\sim}3\%$) across all gap lengths and KV-pair counts, while at the 50M parameter scale SKA-augmented models achieve $100\%$ retrieval accuracy on every configuration tested, including distractor gaps of $4{,}096$ tokens with $32$ KV pairs. Across five additional transfer benchmarks including needle-in-a-haystack, tool-trace, and multi-hop retrieval, SKA consistently outperforms both pure SSM and SSM+Attention hybrids while maintaining constant inference memory. Ablations confirm that the spectral operator, not the prefix masking strategy, drives the retrieval gain.

Problem

Research questions and friction points this paper is trying to address.

KV cache

memory bottleneck

associative recall

state-space models

long-context retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral Koopman Attention

KV-cache-free

Associative Recall