π€ AI Summary
To address multi-view inconsistency and geometric detail distortion in 3D scene inpainting, this paper proposes a high-fidelity 3D Gaussian splatting (3DGS)-based inpainting framework leveraging sparse repair views. The method introduces two key innovations: (1) an automatic mask optimization strategy guided by region-aware uncertainty estimation, where Gaussian scene filtering and back-projection refine occlusion masks to improve localization accuracy and boundary naturalness; and (2) a hybrid 3DGSβNeRF modeling scheme enabling joint multi-view training and fine-grained detail enhancement. Extensive experiments on standard benchmarks demonstrate that our approach achieves superior visual realism, cross-view consistency, and geometric fidelity compared to state-of-the-art methods.
π Abstract
Recent advancements in multi-view 3D reconstruction and novel-view synthesis, particularly through Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), have greatly enhanced the fidelity and efficiency of 3D content creation. However, inpainting 3D scenes remains a challenging task due to the inherent irregularity of 3D structures and the critical need for maintaining multi-view consistency. In this work, we propose a novel 3D Gaussian inpainting framework that reconstructs complete 3D scenes by leveraging sparse inpainted views. Our framework incorporates an automatic Mask Refinement Process and region-wise Uncertainty-guided Optimization. Specifically, we refine the inpainting mask using a series of operations, including Gaussian scene filtering and back-projection, enabling more accurate localization of occluded regions and realistic boundary restoration. Furthermore, our Uncertainty-guided Fine-grained Optimization strategy, which estimates the importance of each region across multi-view images during training, alleviates multi-view inconsistencies and enhances the fidelity of fine details in the inpainted results. Comprehensive experiments conducted on diverse datasets demonstrate that our approach outperforms existing state-of-the-art methods in both visual quality and view consistency.