TTT3R: 3D Reconstruction as Test-Time Training

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Modern recurrent neural networks (RNNs) offer computational efficiency in 3D reconstruction due to their linear time complexity, yet suffer from poor length generalization—failing to generalize to sequences longer than those seen during training. To address this, we formulate 3D reconstruction as a test-time training problem and propose a lightweight, training-free online learning framework. Our method introduces a closed-form learning rate derived from memory-state–observation alignment confidence, dynamically balancing historical information retention and adaptation to new observations. Coupled with a GPU-optimized recurrent architecture and test-time memory update mechanisms, the framework enables low-memory, high-frame-rate inference. On global pose estimation, our approach achieves a 2× accuracy improvement over baselines, operates at 20 FPS using only 6 GB of GPU memory, and robustly handles sequences spanning several thousand frames.

Technology Category

Application Category

📝 Abstract

Modern Recurrent Neural Networks have become a competitive architecture for 3D reconstruction due to their linear-time complexity. However, their performance degrades significantly when applied beyond the training context length, revealing limited length generalization. In this work, we revisit the 3D reconstruction foundation models from a Test-Time Training perspective, framing their designs as an online learning problem. Building on this perspective, we leverage the alignment confidence between the memory state and incoming observations to derive a closed-form learning rate for memory updates, to balance between retaining historical information and adapting to new observations. This training-free intervention, termed TTT3R, substantially improves length generalization, achieving a $2 imes$ improvement in global pose estimation over baselines, while operating at 20 FPS with just 6 GB of GPU memory to process thousands of images. Code available in https://rover-xingyu.github.io/TTT3R

Problem

Research questions and friction points this paper is trying to address.

Addressing limited length generalization in 3D reconstruction models

Balancing memory retention and adaptation in recurrent neural networks

Improving pose estimation accuracy without additional training requirements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-time training for 3D reconstruction

Closed-form learning rate for memory updates

Training-free intervention improving length generalization

🔎 Similar Papers

Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View