🤖 AI Summary
This work addresses the severe memory bottleneck in long-sequence monocular video streaming 3D reconstruction, where key-value (KV) cache size grows linearly with sequence length. Existing approaches either truncate the cache—compromising geometric fidelity—or employ attention heuristics that disregard 3D structure, failing to preserve critical geometric details. To overcome these limitations, we propose a training-free KV cache management framework that dynamically prunes redundant tokens online by leveraging the model’s own 3D geometric outputs. Our method integrates hierarchical dual-level importance scoring, a privileged token protection mechanism, and cosine-similarity-based inter-layer cache budget allocation to jointly enhance cache efficiency and geometric accuracy. Experiments demonstrate that our approach reduces KV cache usage by nearly 50% and accelerates inference by 1.75× across multiple benchmarks while maintaining high-quality reconstructions.
📝 Abstract
Streaming 3D reconstruction from long monocular video sequences requires maintaining a key-value (KV) cache that grows linearly with sequence length, creating a severe memory bottleneck. Existing approaches either truncate the cache to a fixed set of anchor frames, leading to reconstruction quality degradation, or rely on attention-score heuristics that are agnostic to 3D scene structure, failing to preserve geometrically valuable tokens. To address these problems, we present GHOST (Geometry-Hierarchical Online Streaming Token Eviction), a training-free KV cache management framework that exploits the model's own 3D geometry outputs to evict redundant tokens online. GHOST introduces three mutually reinforcing innovations: a hierarchical dual-level importance scoring scheme, a privilege mechanism that protects special tokens from eviction, and a cosine-similarity-guided layer-wise budget allocation. Experiments on various benchmarks show that GHOST preserves excellent reconstruction quality while cutting the KV cache by nearly half and delivering 1.75x faster inference compared to state-of-the-art methods. Our code is available at https://github.com/lokiniuniu/GHOST.