๐ค AI Summary
To address detail loss and multi-view inconsistency in high-resolution novel view synthesis from low-resolution inputs, this paper proposes VoxelGridSRโthe first end-to-end, attention-driven 3D voxel super-resolution framework. Operating directly on voxel grids optimized via NeRF, it models arbitrary-scale 3D super-resolution through joint explicit voxel representation and implicit NeRF optimization, ensuring strict multi-view consistency in both geometry and appearance. Crucially, it generalizes zero-shot to unseen scenes and arbitrary scale factors without scene-specific fine-tuning. On multiple benchmarks, VoxelGridSR achieves significant PSNR and SSIM improvements over prior methods. Visually, it recovers sharp geometric structures and high-frequency textures, effectively mitigating the over-smoothing artifacts inherent in conventional NeRFs and the view-inconsistency issues of single-image super-resolution approaches.
๐ Abstract
NeRF-based methods reconstruct 3D scenes by building a radiance field with implicit or explicit representations. While NeRF-based methods can perform novel view synthesis (NVS) at arbitrary scale, the performance in high-resolution novel view synthesis (HRNVS) with low-resolution (LR) optimization often results in oversmoothing. On the other hand, single-image super-resolution (SR) aims to enhance LR images to HR counterparts but lacks multi-view consistency. To address these challenges, we propose Arbitrary-Scale Super-Resolution NeRF (ASSR-NeRF), a novel framework for super-resolution novel view synthesis (SRNVS). We propose an attention-based VoxelGridSR model to directly perform 3D super-resolution (SR) on the optimized volume. Our model is trained on diverse scenes to ensure generalizability. For unseen scenes trained with LR views, we then can directly apply our VoxelGridSR to further refine the volume and achieve multi-view consistent SR. We demonstrate quantitative and qualitatively that the proposed method achieves significant performance in SRNVS.