One-Shot Refiner: Boosting Feed-forward Novel View Synthesis via One-Step Diffusion

📅 2026-01-20

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the limitations of existing Vision Transformer (ViT)-based feedforward methods for novel view synthesis, which suffer from low input resolution and a lack of 3D consistency in their generation modules, leading to loss of high-frequency details and structural inconsistencies across views. To overcome these issues, we propose a novel framework that integrates a dual-domain detail-aware module with a feature-guided one-step diffusion network. Our approach preserves ViT’s geometric priors while leveraging 3D Gaussian splatting to achieve high-resolution, high-fidelity, and multi-view consistent rendering. Crucially, we unify high-resolution detail enhancement with 3D-aware geometric representation in a joint optimization framework, co-training the ViT backbone and the diffusion refinement module. Experiments demonstrate that our method significantly outperforms existing feedforward approaches across multiple benchmarks.

Technology Category

Application Category

📝 Abstract

We present a novel framework for high-fidelity novel view synthesis (NVS) from sparse images, addressing key limitations in recent feed-forward 3D Gaussian Splatting (3DGS) methods built on Vision Transformer (ViT) backbones. While ViT-based pipelines offer strong geometric priors, they are often constrained by low-resolution inputs due to computational costs. Moreover, existing generative enhancement methods tend to be 3D-agnostic, resulting in inconsistent structures across views, especially in unseen regions. To overcome these challenges, we design a Dual-Domain Detail Perception Module, which enables handling high-resolution images without being limited by the ViT backbone, and endows Gaussians with additional features to store high-frequency details. We develop a feature-guided diffusion network, which can preserve high-frequency details during the restoration process. We introduce a unified training strategy that enables joint optimization of the ViT-based geometric backbone and the diffusion-based refinement module. Experiments demonstrate that our method can maintain superior generation quality across multiple datasets.

Problem

Research questions and friction points this paper is trying to address.

novel view synthesis

3D Gaussian Splatting

Vision Transformer

high-resolution

3D consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-Shot Refiner

Dual-Domain Detail Perception

Feature-Guided Diffusion