π€ AI Summary
Existing neural radiance fields (NeRFs) suffer from severe degradation in generalization and reconstruction quality under few-view settings (1β3 input views). To address this, we propose Neural Rendering Transformer (NRT), the first framework featuring a global-local dual-path feature fusion mechanism: the global path models scene-level semantic context, while the local path encodes epipolar geometric constraints. NRT further introduces 3D sparse attention and kernel regression-guided adaptive ray sampling to enhance sampling efficiency and geometric fidelity. Crucially, NRT requires no scene-specific priors or fine-tuningβonly a minimal number of input views suffice for high-fidelity novel view synthesis. Extensive experiments demonstrate that NRT significantly outperforms state-of-the-art methods across multiple benchmarks, achieving superior performance in PSNR, SSIM, and depth error metrics. Notably, under the most challenging 1β2-view configurations, NRT excels in geometric consistency and fine-grained texture recovery.
π Abstract
Neural Radiance Fields (NeRF) have transformed novel view synthesis by modeling scene-specific volumetric representations directly from images. While generalizable NeRF models can generate novel views across unknown scenes by learning latent ray representations, their performance heavily depends on a large number of multi-view observations. However, with limited input views, these methods experience significant degradation in rendering quality. To address this limitation, we propose GoLF-NRT: a Global and Local feature Fusion-based Neural Rendering Transformer. GoLF-NRT enhances generalizable neural rendering from few input views by leveraging a 3D transformer with efficient sparse attention to capture global scene context. In parallel, it integrates local geometric features extracted along the epipolar line, enabling high-quality scene reconstruction from as few as 1 to 3 input views. Furthermore, we introduce an adaptive sampling strategy based on attention weights and kernel regression, improving the accuracy of transformer-based neural rendering. Extensive experiments on public datasets show that GoLF-NRT achieves state-of-the-art performance across varying numbers of input views, highlighting the effectiveness and superiority of our approach. Code is available at https://github.com/KLMAV-CUC/GoLF-NRT.