🤖 AI Summary
This work addresses the challenge of reconstructing static 3D scenes from monocular videos when long-term occlusions caused by dynamic objects severely degrade performance. To this end, the authors propose a generative model–assisted Gaussian splatting approach that first removes dynamic regions via motion-aware segmentation and then employs a diffusion model to inpaint occluded areas, yielding pseudo-ground-truth supervision. A learnable realism scalar is introduced to dynamically modulate Gaussian opacities, enabling realism-aware rendering. This study presents the first integration of generative inpainting with Gaussian splatting, introduces a learnable realism-aware weighting mechanism, and constructs Trajectory-Match—the first dataset containing ground-truth static scenes aligned with real-world trajectories. Experiments on DAVIS and Trajectory-Match demonstrate that the method significantly outperforms existing approaches under extensive and persistent occlusion, achieving state-of-the-art performance.
📝 Abstract
Reconstructing static 3D scene from monocular video with dynamic objects is important for numerous applications such as virtual reality and autonomous driving. Current approaches typically rely on background for static scene reconstruction, limiting the ability to recover regions occluded by dynamic objects. In this paper, we propose GA-GS, a Generation-Assisted Gaussian Splatting method for Static Scene Reconstruction. The key innovation of our work lies in leveraging generation to assist in reconstructing occluded regions. We employ a motion-aware module to segment and remove dynamic regions, and thenuse a diffusion model to inpaint the occluded areas, providing pseudo-ground-truth supervision. To balance contributions from real background and generated region, we introduce a learnable authenticity scalar for each Gaussian primitive, which dynamically modulates opacity during splatting for authenticity-aware rendering and supervision. Since no existing dataset provides ground-truth static scene of video with dynamic objects, we construct a dataset named Trajectory-Match, using a fixed-path robot to record each scene with/without dynamic objects, enabling quantitative evaluation in reconstruction of occluded regions. Extensive experiments on both the DAVIS and our dataset show that GA-GS achieves state-of-the-art performance in static scene reconstruction, especially in challenging scenarios with large-scale, persistent occlusions.