🤖 AI Summary
Traditional Structure-from-Motion (SfM) methods face prohibitive computational costs and poor scalability on CPUs for large-scale scenes, while existing deep learning approaches are constrained by GPU memory, limiting them to hundreds of views. This paper introduces the first fully sparse and GPU-parallel SfM framework that jointly accelerates global pose estimation and bundle adjustment (BA), breaking the classical trade-off among accuracy, speed, and scalability. Leveraging sparsity-aware optimization and fine-grained parallelization, our method drastically reduces both memory footprint and computational complexity. On a 5,000-image dataset, it achieves approximately 40× speedup over COLMAP while maintaining comparable or superior reconstruction accuracy. To our knowledge, this is the first end-to-end SfM system capable of high-accuracy, near-real-time reconstruction for scenes comprising thousands to tens of thousands of views.
📝 Abstract
Structure-from-Motion (SfM), a method that recovers camera poses and scene geometry from uncalibrated images, is a central component in robotic reconstruction and simulation. Despite the state-of-the-art performance of traditional SfM methods such as COLMAP and its follow-up work, GLOMAP, naive CPU-specialized implementations of bundle adjustment (BA) or global positioning (GP) introduce significant computational overhead when handling large-scale scenarios, leading to a trade-off between accuracy and speed in SfM. Moreover, the blessing of efficient C++-based implementations in COLMAP and GLOMAP comes with the curse of limited flexibility, as they lack support for various external optimization options. On the other hand, while deep learning based SfM pipelines like VGGSfM and VGGT enable feed-forward 3D reconstruction, they are unable to scale to thousands of input views at once as GPU memory consumption increases sharply as the number of input views grows. In this paper, we unleash the full potential of GPU parallel computation to accelerate each critical stage of the standard SfM pipeline. Building upon recent advances in sparse-aware bundle adjustment optimization, our design extends these techniques to accelerate both BA and GP within a unified global SfM framework. Through extensive experiments on datasets of varying scales (e.g. 5000 images where VGGSfM and VGGT run out of memory), our method demonstrates up to about 40 times speedup over COLMAP while achieving consistently comparable or even improved reconstruction accuracy. Our project page can be found at https://cre185.github.io/InstantSfM/.