Regist3R: Incremental Registration with Stereo Foundation Model

📅 2025-04-16

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Large-scale 3D reconstruction from unordered multi-view images faces challenges including prohibitive computational cost and severe error accumulation in global optimization. This paper proposes an incremental registration framework grounded in stereo vision foundation models, eschewing conventional global optimization in favor of incremental pose estimation, pointmap-based representation, lightweight feature matching, and feature propagation. Key contributions are: (1) the first incremental stereo foundation model tailored for large-scale multi-view reconstruction; (2) the first successful city-scale reconstruction driven by thousands of viewpoints; and (3) the first benchmark dataset for oblique aerial imagery featuring long-baseline image sequences and百余-view configurations. Experiments demonstrate that our method matches state-of-the-art optimization-based approaches in accuracy on standard benchmarks while achieving significantly higher efficiency, and consistently outperforms existing SOTA methods on our custom aerial dataset.

Technology Category

Application Category

📝 Abstract

Multi-view 3D reconstruction has remained an essential yet challenging problem in the field of computer vision. While DUSt3R and its successors have achieved breakthroughs in 3D reconstruction from unposed images, these methods exhibit significant limitations when scaling to multi-view scenarios, including high computational cost and cumulative error induced by global alignment. To address these challenges, we propose Regist3R, a novel stereo foundation model tailored for efficient and scalable incremental reconstruction. Regist3R leverages an incremental reconstruction paradigm, enabling large-scale 3D reconstructions from unordered and many-view image collections. We evaluate Regist3R on public datasets for camera pose estimation and 3D reconstruction. Our experiments demonstrate that Regist3R achieves comparable performance with optimization-based methods while significantly improving computational efficiency, and outperforms existing multi-view reconstruction models. Furthermore, to assess its performance in real-world applications, we introduce a challenging oblique aerial dataset which has long spatial spans and hundreds of views. The results highlight the effectiveness of Regist3R. We also demonstrate the first attempt to reconstruct large-scale scenes encompassing over thousands of views through pointmap-based foundation models, showcasing its potential for practical applications in large-scale 3D reconstruction tasks, including urban modeling, aerial mapping, and beyond.

Problem

Research questions and friction points this paper is trying to address.

Addressing high computational cost in multi-view 3D reconstruction

Reducing cumulative error from global alignment in reconstruction

Enabling scalable reconstruction from unordered many-view images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Incremental stereo model for scalable reconstruction

Efficient large-scale 3D from unordered images

Pointmap foundation for thousand-view scenes

🔎 Similar Papers

Deep Learning-Based Point Cloud Registration: A Comprehensive Survey and Taxonomy