🤖 AI Summary
To address distortions in large-scale outdoor street-scene NeRF reconstruction caused by dynamic objects, sparse camera coverage, illumination variations, and texture scarcity, this paper proposes a semantic-guided robust NeRF method. Methodologically, it integrates Grounded SAM–generated segmentation masks—enabling dynamic object removal, sky modeling, and ground-plane geometric regularization—for the first time; introduces learnable appearance embeddings to adaptively correct inter-view illumination inconsistencies; and unifies semantic guidance, multi-scale volumetric rendering, and implicit geometric constraints within the ZipNeRF framework. Evaluated on real-world street-scene datasets, the method achieves significant improvements over baselines: synthesized images exhibit fewer artifacts and sharper edges, with PSNR increased by 2.1 dB and SSIM improved by 0.032.
📝 Abstract
Recent advances in Neural Radiance Fields (NeRF) have shown great potential in 3D reconstruction and novel view synthesis, particularly for indoor and small-scale scenes. However, extending NeRF to large-scale outdoor environments presents challenges such as transient objects, sparse cameras and textures, and varying lighting conditions. In this paper, we propose a segmentation-guided enhancement to NeRF for outdoor street scenes, focusing on complex urban environments. Our approach extends ZipNeRF and utilizes Grounded SAM for segmentation mask generation, enabling effective handling of transient objects, modeling of the sky, and regularization of the ground. We also introduce appearance embeddings to adapt to inconsistent lighting across view sequences. Experimental results demonstrate that our method outperforms the baseline ZipNeRF, improving novel view synthesis quality with fewer artifacts and sharper details.