🤖 AI Summary
In 3D scene reconstruction, generative-prior-based occlusion removal often introduces artifacts and blur, and lacks standardized benchmarks evaluating real-world complexity and viewpoint variation. This paper proposes the first non-generative, end-to-end occlusion removal framework for reconstructing complete, geometrically faithful, artifact-free 3D scenes from incomplete multi-view images. Our contributions are threefold: (1) DeclutterSet—the first benchmark dataset featuring layered occlusions (foreground, midground, background) and significant inter-frame motion across views; (2) a novel stochastic SSIM loss with interpretability and occlusion-aware annealing regularization; and (3) a unified architecture integrating joint multi-view camera optimization, occlusion-aware volumetric rendering, and geometric consistency constraints. Evaluated on DeclutterSet, our method significantly outperforms state-of-the-art approaches, establishing a new robust baseline for neural visual 3D scene reconstruction.
📝 Abstract
Recent novel view synthesis (NVS) techniques, including Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have greatly advanced 3D scene reconstruction with high-quality rendering and realistic detail recovery. Effectively removing occlusions while preserving scene details can further enhance the robustness and applicability of these techniques. However, existing approaches for object and occlusion removal predominantly rely on generative priors, which, despite filling the resulting holes, introduce new artifacts and blurriness. Moreover, existing benchmark datasets for evaluating occlusion removal methods lack realistic complexity and viewpoint variations. To address these issues, we introduce DeclutterSet, a novel dataset featuring diverse scenes with pronounced occlusions distributed across foreground, midground, and background, exhibiting substantial relative motion across viewpoints. We further introduce DeclutterNeRF, an occlusion removal method free from generative priors. DeclutterNeRF introduces joint multi-view optimization of learnable camera parameters, occlusion annealing regularization, and employs an explainable stochastic structural similarity loss, ensuring high-quality, artifact-free reconstructions from incomplete images. Experiments demonstrate that DeclutterNeRF significantly outperforms state-of-the-art methods on our proposed DeclutterSet, establishing a strong baseline for future research.