🤖 AI Summary
Existing 3D Gaussian Splatting methods struggle to distinguish static scene components from transient objects, leading to artifact-prone Gaussian densification and severely degrading novel-view synthesis quality. To address this, we propose a decoupled dynamic modeling framework. Our method introduces a novel *delayed Gaussian growth* strategy, wherein Gaussian splitting and cloning are deferred until after static geometry convergence. Additionally, we design a *scale-cascaded mask bootstrapping* mechanism that jointly leverages multi-scale feature similarity supervision and progressive mask-guided optimization to explicitly separate static reconstruction from transient modeling. This approach fundamentally suppresses transient overfitting at its source. Evaluated on multiple challenging datasets containing complex transient objects, our method significantly outperforms state-of-the-art approaches: it substantially reduces rendering artifacts while enabling high-fidelity, real-time, photorealistic novel-view synthesis.
📝 Abstract
3D Gaussian Splatting (3DGS) has gained significant attention for its real-time, photo-realistic rendering in novel-view synthesis and 3D modeling. However, existing methods struggle with accurately modeling scenes affected by transient objects, leading to artifacts in the rendered images. We identify that the Gaussian densification process, while enhancing scene detail capture, unintentionally contributes to these artifacts by growing additional Gaussians that model transient disturbances. To address this, we propose RobustSplat, a robust solution based on two critical designs. First, we introduce a delayed Gaussian growth strategy that prioritizes optimizing static scene structure before allowing Gaussian splitting/cloning, mitigating overfitting to transient objects in early optimization. Second, we design a scale-cascaded mask bootstrapping approach that first leverages lower-resolution feature similarity supervision for reliable initial transient mask estimation, taking advantage of its stronger semantic consistency and robustness to noise, and then progresses to high-resolution supervision to achieve more precise mask prediction. Extensive experiments on multiple challenging datasets show that our method outperforms existing methods, clearly demonstrating the robustness and effectiveness of our method. Our project page is https://fcyycf.github.io/RobustSplat/.