🤖 AI Summary
Existing 3D Gaussian Splatting (3DGS) inpainting methods rely on 2D generative models, struggle to preserve cross-modal spatiotemporal consistency, and require time-consuming Gaussian parameter retraining. To address these limitations, this paper proposes the first search-and-place inpainting framework operating entirely within 3D space. Leveraging structural redundancy inherent in driving scenes, our method extracts multi-scale local contextual features from a complete 3DGS reconstruction, performs structured 3D spatial search to identify geometrically and semantically similar patches, and then replaces and optimizes them via fusion. This enables end-to-end 3D content completion without invoking 2D diffusion or GAN models, nor retraining Gaussian parameters. Evaluated on multiple autonomous driving datasets, our approach achieves state-of-the-art performance, significantly improving inpainting accuracy and cross-modal compatibility, while demonstrating strong generalization to diverse real-world scenes.
📝 Abstract
This paper presents GS-RoadPatching, an inpainting method for driving scene completion by referring to completely reconstructed regions, which are represented by 3D Gaussian Splatting (3DGS). Unlike existing 3DGS inpainting methods that perform generative completion relying on 2D perspective-view-based diffusion or GAN models to predict limited appearance or depth cues for missing regions, our approach enables substitutional scene inpainting and editing directly through the 3DGS modality, extricating it from requiring spatial-temporal consistency of 2D cross-modals and eliminating the need for time-intensive retraining of Gaussians. Our key insight is that the highly repetitive patterns in driving scenes often share multi-modal similarities within the implicit 3DGS feature space and are particularly suitable for structural matching to enable effective 3DGS-based substitutional inpainting. Practically, we construct feature-embedded 3DGS scenes to incorporate a patch measurement method for abstracting local context at different scales and, subsequently, propose a structural search method to find candidate patches in 3D space effectively. Finally, we propose a simple yet effective substitution-and-fusion optimization for better visual harmony. We conduct extensive experiments on multiple publicly available datasets to demonstrate the effectiveness and efficiency of our proposed method in driving scenes, and the results validate that our method achieves state-of-the-art performance compared to the baseline methods in terms of both quality and interoperability. Additional experiments in general scenes also demonstrate the applicability of the proposed 3D inpainting strategy. The project page and code are available at: https://shanzhaguoo.github.io/GS-RoadPatching/