🤖 AI Summary
To address the limited field-of-view in monocular RGB streams—which hinders complete scene coverage in 3D Gaussian Splatting (3DGS)—this paper introduces the first real-time, large-scale 3D reconstruction framework tailored for multi-camera devices. Methodologically, it employs hierarchical camera initialization and a lightweight multi-camera bundle adjustment to achieve calibration-free, drift-free online trajectory estimation; further, it proposes a redundancy-free Gaussian sampling strategy and a frequency-aware optimization scheduler, enabling efficient multi-view data fusion via a unified Gaussian representation for online registration and mapping. Experiments demonstrate that, given only raw multi-camera video streams, the system reconstructs high-fidelity scenes spanning hundreds of meters within two minutes. It achieves state-of-the-art performance in reconstruction speed, robustness to motion and lighting variations, and geometric fidelity.
📝 Abstract
Recent advances in 3D Gaussian Splatting (3DGS) have enabled efficient free-viewpoint rendering and photorealistic scene reconstruction. While on-the-fly extensions of 3DGS have shown promise for real-time reconstruction from monocular RGB streams, they often fail to achieve complete 3D coverage due to the limited field of view (FOV). Employing a multi-camera rig fundamentally addresses this limitation. In this paper, we present the first on-the-fly 3D reconstruction framework for multi-camera rigs. Our method incrementally fuses dense RGB streams from multiple overlapping cameras into a unified Gaussian representation, achieving drift-free trajectory estimation and efficient online reconstruction. We propose a hierarchical camera initialization scheme that enables coarse inter-camera alignment without calibration, followed by a lightweight multi-camera bundle adjustment that stabilizes trajectories while maintaining real-time performance. Furthermore, we introduce a redundancy-free Gaussian sampling strategy and a frequency-aware optimization scheduler to reduce the number of Gaussian primitives and the required optimization iterations, thereby maintaining both efficiency and reconstruction fidelity. Our method reconstructs hundreds of meters of 3D scenes within just 2 minutes using only raw multi-camera video streams, demonstrating unprecedented speed, robustness, and Fidelity for on-the-fly 3D scene reconstruction.