🤖 AI Summary
This work addresses the challenge of effectively leveraging monocular depth priors to enhance geometric accuracy and rendering quality in Gaussian splatting when precise depth data are unavailable. The authors propose a weakly supervised training framework that incorporates scale-ambiguous and noisy monocular depth maps as priors. By analyzing geometric consistency, the method identifies ill-posed regions and applies selective depth regularization only within these areas, thereby preventing erroneous depth from corrupting well-constrained structures. Integrated with a scale-alignment strategy and off-the-shelf depth estimators, this approach seamlessly fits into existing Gaussian splatting pipelines. Experiments across multiple datasets demonstrate significant improvements in both geometry and rendering fidelity, while maintaining compatibility with various Gaussian splatting variants and depth backbone networks.
📝 Abstract
Using accurate depth priors in 3D Gaussian Splatting helps mitigate artifacts caused by sparse training data and textureless surfaces. However, acquiring accurate depth maps requires specialized acquisition systems. Foundation monocular depth estimation models offer a cost-effective alternative, but they suffer from scale ambiguity, multi-view inconsistency, and local geometric inaccuracies, which can degrade rendering performance when applied naively. This paper addresses the challenge of reliably leveraging monocular depth priors for Gaussian Splatting (GS) rendering enhancement. To this end, we introduce a training framework integrating scale-ambiguous and noisy depth priors into geometric supervision. We highlight the importance of learning from weakly aligned depth variations. We introduce a method to isolate ill-posed geometry for selective monocular depth regularization, restricting the propagation of depth inaccuracies into well-reconstructed 3D structures. Extensive experiments across diverse datasets show consistent improvements in geometric accuracy, leading to more faithful depth estimation and higher rendering quality across different GS variants and monocular depth backbones tested.