🤖 AI Summary
This work addresses the significant degradation in reconstruction quality of 3D Gaussian Splatting when applied to scenes containing transient objects, a problem rooted in the circular dependency between transient detection and static scene reconstruction. To resolve this, the authors propose the Failure-to-Prior framework, which explicitly leverages the initial reconstruction failures to generate structured priors for transient objects. Specifically, object-level pseudo-masks are constructed by integrating photometric residuals, feature mismatches, and instance boundaries from SAM2. A lightweight MLP then facilitates a smooth transition from prior-based supervision to self-consistent optimization. Evaluated on the RobustNeRF and NeRF On-the-go benchmarks, the method outperforms existing approaches, demonstrating particularly strong performance in regions densely populated with transient objects.
📝 Abstract
While 3D Gaussian Splatting (3DGS) achieves real-time photorealistic rendering, its performance degrades significantly when training images contain transient objects that violate multi-view consistency. Existing methods face a circular dependency: accurate transient detection requires a well-reconstructed static scene, while clean reconstruction itself depends on reliable transient masks. We address this challenge with DualSplat, a Failure-to-Prior framework that converts first-pass reconstruction failures into explicit priors for a second reconstruction stage. We observe that transients, which appear in only a subset of views, often manifest as incomplete fragments during conservative initial training. We exploit these failures to construct object-level pseudo-masks by combining photometric residuals, feature mismatches, and SAM2 instance boundaries. These pseudo-masks then guide a clean second-pass 3DGS optimization, while a lightweight MLP refines them online by gradually shifting from prior supervision to self-consistency. Experiments on RobustNeRF and NeRF On-the-go show that DualSplat outperforms existing baselines, demonstrating particularly clear advantages in transient-heavy scenes and transient regions.