ZipMap: Linear-Time Stateful 3D Reconstruction with Test-Time Training

📅 2026-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scalability limitations of existing Transformer-based 3D reconstruction methods, which suffer from quadratic time complexity and struggle with large-scale image collections, while sequential approaches, though efficient, compromise reconstruction accuracy. To overcome this trade-off, the authors propose a novel feedforward architecture equipped with state memory that compresses multi-view images into a compact implicit scene representation through a single forward pass, enabling bidirectional 3D reconstruction with linear time complexity. This approach is the first to achieve reconstruction quality on par with or superior to quadratic-complexity methods while maintaining linear scaling, and it supports real-time querying and streaming reconstruction. Experiments demonstrate that the method reconstructs over 700 frames in just 10 seconds on a single H100 GPU—more than 20× faster than state-of-the-art approaches like VGGT—without sacrificing, and often improving upon, reconstruction accuracy.

Technology Category

Application Category

📝 Abstract
Feed-forward transformer models have driven rapid progress in 3D vision, but state-of-the-art methods such as VGGT and $π^3$ have a computational cost that scales quadratically with the number of input images, making them inefficient when applied to large image collections. Sequential-reconstruction approaches reduce this cost but sacrifice reconstruction quality. We introduce ZipMap, a stateful feed-forward model that achieves linear-time, bidirectional 3D reconstruction while matching or surpassing the accuracy of quadratic-time methods. ZipMap employs test-time training layers to zip an entire image collection into a compact hidden scene state in a single forward pass, enabling reconstruction of over 700 frames in under 10 seconds on a single H100 GPU, more than $20\times$ faster than state-of-the-art methods such as VGGT. Moreover, we demonstrate the benefits of having a stateful representation in real-time scene-state querying and its extension to sequential streaming reconstruction.
Problem

Research questions and friction points this paper is trying to address.

3D reconstruction
computational efficiency
quadratic complexity
stateful modeling
large-scale image collections
Innovation

Methods, ideas, or system contributions that make the work stand out.

linear-time reconstruction
stateful 3D reconstruction
test-time training
feed-forward transformer
scene state compression
🔎 Similar Papers
No similar papers found.