π€ AI Summary
This work addresses the vulnerability of Generalizable Neural Radiance Fields (GeNeRF) to transient occluders under sparse-view settings, which often leads to cross-view structural inconsistencies and degraded reconstruction quality. To mitigate this issue, the authors propose a multi-view uncertainty-guided GeNeRF framework that, for the first time, decomposes uncertainty into structural discrepancies from source views and observation anomalies in target views. By integrating a heteroscedastic reconstruction loss, the method adaptively modulates supervision signals based on estimated uncertainties. This approach effectively suppresses the adverse effects of transient interference, significantly enhancing geometric consistency and reconstruction robustness. Extensive experiments demonstrate that the proposed method outperforms existing generalizable NeRF approaches across multiple datasets and achieves performance comparable to scene-specific optimized NeRFs designed for occlusion-free environments.
π Abstract
Generalizable Neural Radiance Fields (GeNeRFs) enable high-quality scene reconstruction from sparse views and can generalize to unseen scenes. However, in real-world settings, transient distractors break cross-view structural consistency, corrupting supervision and degrading reconstruction quality. Existing distractor-free NeRF methods rely on per-scene optimization and estimate uncertainty from per-view reconstruction errors, which are not reliable for GeNeRFs and often misjudge inconsistent static structures as distractors. To this end, we propose MU-GeNeRF, a Multi-view Uncertainty-guided distractor-aware GeNeRF framework designed to alleviate GeNeRF's robust modeling challenges in the presence of transient distractions. We decompose distractor awareness into two complementary uncertainty components: Source-view Uncertainty, which captures structural discrepancies across source views caused by viewpoint changes or dynamic factors; and Target-view Uncertainty, which detects observation anomalies in the target image induced by transient distractors.These two uncertainties address distinct error sources and are combined through a heteroscedastic reconstruction loss, which guides the model to adaptively modulate supervision, enabling more robust distractor suppression and geometric modeling.Extensive experiments show that our method not only surpasses existing GeNeRFs but also achieves performance comparable to scene-specific distractor-free NeRFs.