AMB3R: Accurate Feed-forward Metric-scale 3D Reconstruction with Backend

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Addressing the challenge of achieving high-accuracy, metric-scale-consistent dense 3D reconstruction while maintaining generalizability and task scalability, this paper proposes a feed-forward multi-view reconstruction framework. The method employs a compact voxelized scene representation as a unified backend, jointly optimizing depth estimation, multi-view stereo matching, and geometric priors in an end-to-end manner—without test-time fine-tuning or pose refinement. Its core contribution is the first demonstration of direct model generalization across uncalibrated visual odometry (VO) and large-scale structure-from-motion (SfM) tasks, overcoming the task-specific dependency inherent in conventional point-cloud-based approaches. Quantitatively, the method surpasses optimization-based SLAM/SfM systems across camera pose accuracy, depth estimation error, and metric-scale 3D reconstruction quality, establishing new state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

We present AMB3R, a multi-view feed-forward model for dense 3D reconstruction on a metric-scale that addresses diverse 3D vision tasks. The key idea is to leverage a sparse, yet compact, volumetric scene representation as our backend, enabling geometric reasoning with spatial compactness. Although trained solely for multi-view reconstruction, we demonstrate that AMB3R can be seamlessly extended to uncalibrated visual odometry (online) or large-scale structure from motion without the need for task-specific fine-tuning or test-time optimization. Compared to prior pointmap-based models, our approach achieves state-of-the-art performance in camera pose, depth, and metric-scale estimation, 3D reconstruction, and even surpasses optimization-based SLAM and SfM methods with dense reconstruction priors on common benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Develops metric-scale 3D reconstruction from multi-view images

Extends to visual odometry without task-specific fine-tuning

Surpasses optimization-based methods in camera pose and depth estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Feed-forward volumetric reconstruction with compact backend

Seamless extension to odometry without fine-tuning

Outperforms optimization-based SLAM in metric reconstruction

🔎 Similar Papers

No similar papers found.