🤖 AI Summary
Existing occlusion-free radiance field methods lack large-scale real-world datasets to enable systematic evaluation and generalization studies. To address this gap, this work introduces DF3DV-1K, a large-scale real-world dataset comprising 1,048 scenes that, for the first time, provides clean and occluded image pairs across diverse subjects and occlusion types. A subset, DF3DV-41, is specifically curated to assess method robustness. Leveraging data captured with consumer-grade cameras, we establish benchmarks under both neural radiance fields and 3D Gaussian splatting frameworks, enhanced by a fine-tuned diffusion-based 2D inpainter to improve reconstruction quality. Experiments demonstrate consistent performance gains of 0.96 dB in PSNR and 0.057 in LPIPS on DF3DV-41 and the On-the-go dataset, respectively, while also identifying the current state-of-the-art approaches and the most challenging scene categories.
📝 Abstract
Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a large-scale dataset with clean and cluttered images per scene remains lacking, limiting the development. To address this gap, we introduce DF3DV-1K, a large-scale real-world dataset comprising 1,048 scenes, each providing clean and cluttered image sets for benchmarking. In total, the dataset contains 89,924 images captured using consumer cameras to mimic casual capture, spanning 128 distractor types and 161 scene themes across indoor and outdoor environments. A curated subset of 41 scenes, DF3DV-41, is systematically designed to evaluate the robustness of distractor-free radiance field methods under challenging scenarios. Using DF3DV-1K, we benchmark nine recent distractor-free radiance field methods and 3D Gaussian Splatting, identifying the most robust methods and the most challenging scenarios. Beyond benchmarking, we demonstrate an application of DF3DV-1K by fine-tuning a diffusion-based 2D enhancer to improve radiance field methods, achieving average improvements of 0.96 dB PSNR and 0.057 LPIPS on the held-out set (e.g., DF3DV-41) and the On-the-go dataset. We hope DF3DV-1K facilitates the development of distractor-free vision and promotes progress beyond scene-specific approaches.