GoLF-NRT: Integrating Global Context and Local Geometry for Few-Shot View Synthesis

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Existing neural radiance fields (NeRFs) suffer from severe degradation in generalization and reconstruction quality under few-view settings (1–3 input views). To address this, we propose Neural Rendering Transformer (NRT), the first framework featuring a global-local dual-path feature fusion mechanism: the global path models scene-level semantic context, while the local path encodes epipolar geometric constraints. NRT further introduces 3D sparse attention and kernel regression-guided adaptive ray sampling to enhance sampling efficiency and geometric fidelity. Crucially, NRT requires no scene-specific priors or fine-tuning—only a minimal number of input views suffice for high-fidelity novel view synthesis. Extensive experiments demonstrate that NRT significantly outperforms state-of-the-art methods across multiple benchmarks, achieving superior performance in PSNR, SSIM, and depth error metrics. Notably, under the most challenging 1–2-view configurations, NRT excels in geometric consistency and fine-grained texture recovery.

Technology Category

Application Category

📝 Abstract

Neural Radiance Fields (NeRF) have transformed novel view synthesis by modeling scene-specific volumetric representations directly from images. While generalizable NeRF models can generate novel views across unknown scenes by learning latent ray representations, their performance heavily depends on a large number of multi-view observations. However, with limited input views, these methods experience significant degradation in rendering quality. To address this limitation, we propose GoLF-NRT: a Global and Local feature Fusion-based Neural Rendering Transformer. GoLF-NRT enhances generalizable neural rendering from few input views by leveraging a 3D transformer with efficient sparse attention to capture global scene context. In parallel, it integrates local geometric features extracted along the epipolar line, enabling high-quality scene reconstruction from as few as 1 to 3 input views. Furthermore, we introduce an adaptive sampling strategy based on attention weights and kernel regression, improving the accuracy of transformer-based neural rendering. Extensive experiments on public datasets show that GoLF-NRT achieves state-of-the-art performance across varying numbers of input views, highlighting the effectiveness and superiority of our approach. Code is available at https://github.com/KLMAV-CUC/GoLF-NRT.

Problem

Research questions and friction points this paper is trying to address.

Improves few-shot view synthesis with global-local feature fusion

Enhances neural rendering from 1-3 views using 3D transformer

Introduces adaptive sampling for accurate transformer-based rendering

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D transformer with sparse attention

Local geometric feature integration

Adaptive sampling with attention weights

🔎 Similar Papers

No similar papers found.

World Labs

$250,000-$350,000 base salary (good-faith estimate for San Francisco Bay Area upon hire; actual offer based on experience, skills, and qualifications)

San Francisco / San Francisco Office, San Francisco, California, United States

PhD - Effiziente Neuronale Repräsentation von Datensätzen

Bosch Group

Renningen, BW, DE

Authors to Follow