STAC: Plug-and-Play Spatio-Temporal Aware Cache Compression for Streaming 3D Reconstruction

πŸ“… 2026-03-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the memory bottleneck in online 3D streaming reconstruction caused by the linear growth of KV caches in causal Transformers, where naively discarding early cache entries degrades reconstruction quality and temporal consistency. To this end, we propose STAC, a framework that reveals, for the first time, spatiotemporal sparsity within attention mechanisms and introduces a plug-and-play cache compression strategy. Our approach retains critical temporal tokens based on decay-weighted cumulative attention scores, compresses spatially redundant tokens into voxel-aligned representations, and incorporates multi-frame chunked joint optimization to enhance temporal coherence and GPU efficiency. Without modifying the backbone network, STAC achieves state-of-the-art reconstruction quality while reducing memory consumption by nearly 10Γ— and accelerating inference by 4Γ—, significantly improving the real-time performance and scalability of streaming 3D reconstruction.

Technology Category

Application Category

πŸ“ Abstract
Online 3D reconstruction from streaming inputs requires both long-term temporal consistency and efficient memory usage. Although causal VGGT transformers address this challenge through a key-value (KV) cache mechanism, the cache grows linearly with the stream length, creating a major memory bottleneck. Under limited memory budgets, early cache eviction significantly degrades reconstruction quality and temporal consistency. In this work, we observe that attention in causal transformers for 3D reconstruction exhibits intrinsic spatio-temporal sparsity. Based on this insight, we propose STAC, a Spatio-Temporally Aware Cache Compression framework for streaming 3D reconstruction with large causal transformers. STAC consists of three key components: (1) a Working Temporal Token Caching mechanism that preserves long-term informative tokens using decayed cumulative attention scores; (2) a Long-term Spatial Token Caching scheme that compresses spatially redundant tokens into voxel-aligned representations for memory-efficient storage; and (3) a Chunk-based Multi-frame Optimization strategy that jointly processes consecutive frames to improve temporal coherence and GPU efficiency. Extensive experiments show that STAC achieves state-of-the-art reconstruction quality while reducing memory consumption by nearly 10x and accelerating inference by 4x, substantially improving the scalability of real-time 3D reconstruction in streaming settings.
Problem

Research questions and friction points this paper is trying to address.

streaming 3D reconstruction
memory bottleneck
temporal consistency
KV cache
causal transformers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatio-Temporal Sparsity
Cache Compression
Streaming 3D Reconstruction
Causal Transformer
Memory-Efficient Attention
πŸ”Ž Similar Papers
No similar papers found.
R
Runze Wang
University of Science and Technology of China
Yuxuan Song
Yuxuan Song
Tsinghua University
Deep Generative ModelsLLM4Science,
Y
Youcheng Cai
University of Science and Technology of China
Ligang Liu
Ligang Liu
University of Science and Technology of China
Computer GraphicsGeometry Processing3D Printing