VGG-T$^3$: Offline Feed-Forward 3D Reconstruction at Scale

📅 2026-02-26

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the scalability limitations of offline feedforward 3D reconstruction methods, whose computational and memory costs grow quadratically with the number of input views. To overcome this, the authors introduce test-time training into offline 3D reconstruction for the first time, proposing a fixed-size MLP-based scene representation. By distilling variable-length geometric key-value pairs and incorporating key-value attention optimization, the method achieves linear complexity while preserving global aggregation capabilities. Evaluated on a dataset of 1,000 images, the approach completes reconstruction in just 54 seconds—11.6× faster than a softmax-attention baseline—and significantly outperforms existing linear-complexity methods in point-cloud reconstruction accuracy.

Technology Category

Application Category

📝 Abstract

We present a scalable 3D reconstruction model that addresses a critical limitation in offline feed-forward methods: their computational and memory requirements grow quadratically w.r.t. the number of input images. Our approach is built on the key insight that this bottleneck stems from the varying-length Key-Value (KV) space representation of scene geometry, which we distill into a fixed-size Multi-Layer Perceptron (MLP) via test-time training. VGG-T$^3$ (Visual Geometry Grounded Test Time Training) scales linearly w.r.t. the number of input views, similar to online models, and reconstructs a $1k$ image collection in just $54$ seconds, achieving a $11.6\times$ speed-up over baselines that rely on softmax attention. Since our method retains global scene aggregation capability, our point map reconstruction error outperforming other linear-time methods by large margins. Finally, we demonstrate visual localization capabilities of our model by querying the scene representation with unseen images.

Problem

Research questions and friction points this paper is trying to address.

3D reconstruction

offline feed-forward

scalability

computational complexity

memory bottleneck

Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time training

linear scalability

fixed-size MLP