On-the-fly Large-scale 3D Reconstruction from Multi-Camera Rigs

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited field-of-view in monocular RGB streams—which hinders complete scene coverage in 3D Gaussian Splatting (3DGS)—this paper introduces the first real-time, large-scale 3D reconstruction framework tailored for multi-camera devices. Methodologically, it employs hierarchical camera initialization and a lightweight multi-camera bundle adjustment to achieve calibration-free, drift-free online trajectory estimation; further, it proposes a redundancy-free Gaussian sampling strategy and a frequency-aware optimization scheduler, enabling efficient multi-view data fusion via a unified Gaussian representation for online registration and mapping. Experiments demonstrate that, given only raw multi-camera video streams, the system reconstructs high-fidelity scenes spanning hundreds of meters within two minutes. It achieves state-of-the-art performance in reconstruction speed, robustness to motion and lighting variations, and geometric fidelity.

Technology Category

Application Category

📝 Abstract
Recent advances in 3D Gaussian Splatting (3DGS) have enabled efficient free-viewpoint rendering and photorealistic scene reconstruction. While on-the-fly extensions of 3DGS have shown promise for real-time reconstruction from monocular RGB streams, they often fail to achieve complete 3D coverage due to the limited field of view (FOV). Employing a multi-camera rig fundamentally addresses this limitation. In this paper, we present the first on-the-fly 3D reconstruction framework for multi-camera rigs. Our method incrementally fuses dense RGB streams from multiple overlapping cameras into a unified Gaussian representation, achieving drift-free trajectory estimation and efficient online reconstruction. We propose a hierarchical camera initialization scheme that enables coarse inter-camera alignment without calibration, followed by a lightweight multi-camera bundle adjustment that stabilizes trajectories while maintaining real-time performance. Furthermore, we introduce a redundancy-free Gaussian sampling strategy and a frequency-aware optimization scheduler to reduce the number of Gaussian primitives and the required optimization iterations, thereby maintaining both efficiency and reconstruction fidelity. Our method reconstructs hundreds of meters of 3D scenes within just 2 minutes using only raw multi-camera video streams, demonstrating unprecedented speed, robustness, and Fidelity for on-the-fly 3D scene reconstruction.
Problem

Research questions and friction points this paper is trying to address.

Real-time 3D reconstruction from multi-camera video streams
Achieving complete 3D coverage without prior calibration
Maintaining efficiency and fidelity in online scene reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Incremental fusion of multi-camera RGB streams into unified Gaussian representation
Hierarchical camera initialization and lightweight multi-camera bundle adjustment
Redundancy-free Gaussian sampling and frequency-aware optimization scheduler
Yijia Guo
Yijia Guo
Peking University
3DV
T
Tong Hu
National Biomedical Imaging Center, Peking University
Z
Zhiwei Li
School of Software, Nanchang University
L
Liwen Hu
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
K
Keming Qian
National Biomedical Imaging Center, Peking University
X
Xitong Lin
Shenzhen International Graduate School, Tsinghua University
Shengbo Chen
Shengbo Chen
School of Software, Nanchang University
Tiejun Huang
Tiejun Huang
Professor,School of Computer Science, Peking University
Visual Information Processing
L
Lei Ma
National Biomedical Imaging Center, Peking University