Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass

📅 2025-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency and error accumulation inherent in pairwise matching and iterative alignment in large-scale multi-view 3D reconstruction, this paper proposes the first end-to-end multi-view extension architecture, completely abandoning DUSt3R’s pairwise paradigm and post-hoc global alignment. Methodologically, we design a Transformer-based joint encoder for multiple images, incorporating cross-view attention and learnable geometric priors to enable simultaneous feature alignment and joint regression of depth and camera pose for arbitrary numbers of input views. Our approach achieves state-of-the-art accuracy across multiple benchmarks, reducing pose estimation error by 32% and eliminating cumulative drift. Moreover, inference speed improves by over an order of magnitude: reconstruction of thousand-image scenes requires only a single forward pass. This substantially enhances practicality and scalability for large-scale scene reconstruction.

Technology Category

Application Category

📝 Abstract
Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives. Current leading methods such as DUSt3R employ a fundamentally pairwise approach, processing images in pairs and necessitating costly global alignment procedures to reconstruct from multiple views. In this work, we propose Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that achieves efficient and scalable 3D reconstruction by processing many views in parallel. Fast3R's Transformer-based architecture forwards N images in a single forward pass, bypassing the need for iterative alignment. Through extensive experiments on camera pose estimation and 3D reconstruction, Fast3R demonstrates state-of-the-art performance, with significant improvements in inference speed and reduced error accumulation. These results establish Fast3R as a robust alternative for multi-view applications, offering enhanced scalability without compromising reconstruction accuracy.
Problem

Research questions and friction points this paper is trying to address.

3D Image Reconstruction
Large-scale Image Processing
Alignment Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fast3R
Transformer design
3D image reconstruction
J
Jianing Yang
Meta
A
Alexander Sax
University of Michigan
K
Kevin J. Liang
Meta
Mikael Henaff
Mikael Henaff
Meta
Artificial IntelligenceDeep LearningReinforcement Learning
H
Hao Tang
Meta
A
Ang Cao
Meta, University of Michigan
J
Joyce Chai
University of Michigan
Franziska Meier
Franziska Meier
Research Scientist, Facebook AI Research
Machine LearningRobotics
Matt Feiszli
Matt Feiszli
Facebook AI Research
Machine LearningComputer VisionHarmonic AnalysisGeometry