🤖 AI Summary
This work addresses real-time online 3D Gaussian Splatting (3DGS) reconstruction from unlocalized image streams without prior camera pose estimates. We propose the first content-adaptive refinement framework for streaming inputs: it performs per-frame modeling via differentiable Gaussian parameter prediction; introduces cross-frame reliable pixel matching and feature aggregation to enhance geometric consistency and suppress redundant Gaussians; and jointly enforces photometric consistency and density-adaptive pruning—eliminating the need for an initial point cloud or predefined camera parameters. Our method achieves reconstruction quality on par with optimization-based approaches across multiple datasets, while accelerating inference by 150×. Moreover, it demonstrates strong generalization to out-of-distribution scenes.
📝 Abstract
The advent of 3D Gaussian Splatting (3DGS) has advanced 3D scene reconstruction and novel view synthesis. With the growing interest of interactive applications that need immediate feedback, online 3DGS reconstruction in real-time is in high demand. However, none of existing methods yet meet the demand due to three main challenges: the absence of predetermined camera parameters, the need for generalizable 3DGS optimization, and the necessity of reducing redundancy. We propose StreamGS, an online generalizable 3DGS reconstruction method for unposed image streams, which progressively transform image streams to 3D Gaussian streams by predicting and aggregating per-frame Gaussians. Our method overcomes the limitation of the initial point reconstruction cite{dust3r} in tackling out-of-domain (OOD) issues by introducing a content adaptive refinement. The refinement enhances cross-frame consistency by establishing reliable pixel correspondences between adjacent frames. Such correspondences further aid in merging redundant Gaussians through cross-frame feature aggregation. The density of Gaussians is thereby reduced, empowering online reconstruction by significantly lowering computational and memory costs. Extensive experiments on diverse datasets have demonstrated that StreamGS achieves quality on par with optimization-based approaches but does so 150 times faster, and exhibits superior generalizability in handling OOD scenes.