🤖 AI Summary
This work addresses the ghosting artifacts in 3D Gaussian Splatting (3DGS) reconstructions caused by transient objects—such as pedestrians and vehicles—in multi-view imagery. To mitigate this issue, the authors propose a semantic-guided framework for transient object removal that integrates semantic classification into the 3DGS pipeline for the first time. Leveraging the CLIP vision-language model, the method performs category-aware filtering of Gaussian points by accumulating similarity scores between visual features and textual prompts, enabling robust semantic discrimination. The approach further incorporates opacity regularization and periodic pruning to refine the representation, all without relying on motion cues or additional memory overhead. Evaluated on the RobustNeRF benchmark, the method consistently outperforms the original 3DGS across all four sequences, achieving significantly improved reconstruction quality while preserving real-time rendering capability and low memory consumption.
📝 Abstract
Transient objects in casual multi-view captures cause ghosting artifacts in 3D Gaussian Splatting (3DGS) reconstruction. Existing solutions relied on scene decomposition at significant memory cost or on motion-based heuristics that were vulnerable to parallax ambiguity. A semantic filtering framework was proposed for category-aware transient removal using vision-language models. CLIP similarity scores between rendered views and distractor text prompts were accumulated per-Gaussian across training iterations. Gaussians exceeding a calibrated threshold underwent opacity regularization and periodic pruning. Unlike motion-based approaches, semantic classification resolved parallax ambiguity by identifying object categories independently of motion patterns. Experiments on the RobustNeRF benchmark demonstrated consistent improvement in reconstruction quality over vanilla 3DGS across four sequences, while maintaining minimal memory overhead and real-time rendering performance. Threshold calibration and comparisons with baselines validated semantic guidance as a practical strategy for transient removal in scenarios with predictable distractor categories.