🤖 AI Summary
This work addresses the severe degradation in 3D Gaussian Splatting (3DGS) reconstruction quality under sparse training views (only 3–12 images), manifesting as “floating artifacts” and “background collapse” in unseen viewpoints. To tackle this, we propose a depth-prior-guided optimization framework. Our key contributions are: (1) a geometry-aware depth prior derived from monocular depth estimation, enforcing spatial plausibility of Gaussian distributions; (2) an unseen-view regularization module that explicitly enhances generalization under sparse view coverage; and (3) an adaptive joint geometry-density pruning strategy to improve reconstruction compactness and stability. Integrating differentiable Gaussian rendering, depth-guided optimization, and viewpoint-aware regularization, our method achieves state-of-the-art performance on Mip-NeRF360, LLFF, and DTU benchmarks—reaching top-tier forward-facing scene quality using only three input images, with efficient training and real-time inference.
📝 Abstract
3D Gaussian Splatting (3DGS) has recently enabled real-time rendering of unbounded 3D scenes for novel view synthesis. However, this technique requires dense training views to accurately reconstruct 3D geometry. A limited number of input views will significantly degrade reconstruction quality, resulting in artifacts such as"floaters"and"background collapse"at unseen viewpoints. In this work, we introduce SparseGS, an efficient training pipeline designed to address the limitations of 3DGS in scenarios with sparse training views. SparseGS incorporates depth priors, novel depth rendering techniques, and a pruning heuristic to mitigate floater artifacts, alongside an Unseen Viewpoint Regularization module to alleviate background collapses. Our extensive evaluations on the Mip-NeRF360, LLFF, and DTU datasets demonstrate that SparseGS achieves high-quality reconstruction in both unbounded and forward-facing scenarios, with as few as 12 and 3 input images, respectively, while maintaining fast training and real-time rendering capabilities.