ManifoldKV: Training-Free KV Cache Compression via Euclidean Outlier Detection

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the challenge of linearly growing key-value (KV) cache memory with sequence length in long-context reasoning, which necessitates efficient compression to preserve critical historical information. The authors propose a training-free, plug-and-play KV cache compression method that, for the first time, employs Euclidean distance—rather than cosine similarity—to measure the deviation of key vectors from local centroids, thereby jointly capturing both angular and magnitude information to assess token importance. This approach effectively mitigates directional conflicts and global centroid dilution. A sliding window mechanism is further introduced to enhance robustness in ultra-long contexts. On the RULER benchmark, the method achieves 95.7% accuracy at 20% compression across 4K–16K contexts, 92.4% accuracy under 50% compression on the 3-key NIAH task (a 15.4-point improvement over the baseline), and 84.3% accuracy at 25% compression in 64K contexts, recovering performance by a remarkable 49 points.

Technology Category

Application Category

📝 Abstract

Long-context inference is constrained by KV-cache memory, which grows linearly with sequence length; KV-cache compression therefore hinges on reliably selecting which past tokens to retain. Most geometry-based eviction methods score keys by cosine similarity to a global centroid, but cosine is scale-invariant and can discard magnitude cues that distinguish semantically salient tokens. We propose ManifoldKV, a training-free scorer that ranks tokens by Euclidean distance to the key centroid, capturing both angular and radial deviations. On the RULER benchmark, ManifoldKV achieves 95.7% accuracy at 4K-16K contexts with 20% compression; matching the best geometric baseline while improving robustness in two regimes where cosine scoring fails. First, on multi-key retrieval, ManifoldKV reduces directional collisions, achieving 92.4% vs KeyDiff's 77.0% (+15.4 points) on 3-key NIAH at 50% compression. Second, to address dilution and performance collapse of global centroids at 64K context, we introduce WindowedManifoldKV, which restores accuracy to 84.3% at 25% compression, a 49-point recovery over global L2 and +3.2 points over KeyDiff. The method requires only 3 lines of code and works across 4 architectures without tuning.

Problem

Research questions and friction points this paper is trying to address.

KV cache compression

long-context inference

outlier detection

manifold geometry

memory efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

KV cache compression

Euclidean distance

training-free