NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

📅 2024-05-24
🏛️ arXiv.org
📈 Citations: 19
Influential: 3
📄 PDF
🤖 AI Summary
This paper introduces a novel zero-shot novel view synthesis (NVS) paradigm that generates high-fidelity novel views from single/multi-view static scenes or monocular dynamic videos—without fine-tuning pre-trained video diffusion models. To address the challenge of geometry-aware synthesis under zero-shot conditions, the method (1) pioneers the adaptation of video diffusion models for zero-shot NVS; (2) proposes an iterative score function modulation mechanism grounded in scene geometric priors, augmented by optical-flow-guided view deformation; and (3) theoretically derives and implements pose- and sampling-step-adaptive error-bound control to stabilize generation. Extensive experiments demonstrate state-of-the-art performance on both static and dynamic benchmarks, with significant improvements in quantitative metrics (e.g., PSNR, SSIM, LPIPS) and qualitative fidelity. The framework supports diverse input modalities—including single image, multi-view images, and monocular video—and the code is publicly released.

Technology Category

Application Category

📝 Abstract
By harnessing the potent generative capabilities of pre-trained large video diffusion models, we propose NVS-Solver, a new novel view synthesis (NVS) paradigm that operates extit{without} the need for training. NVS-Solver adaptively modulates the diffusion sampling process with the given views to enable the creation of remarkable visual experiences from single or multiple views of static scenes or monocular videos of dynamic scenes. Specifically, built upon our theoretical modeling, we iteratively modulate the score function with the given scene priors represented with warped input views to control the video diffusion process. Moreover, by theoretically exploring the boundary of the estimation error, we achieve the modulation in an adaptive fashion according to the view pose and the number of diffusion steps. Extensive evaluations on both static and dynamic scenes substantiate the significant superiority of our NVS-Solver over state-of-the-art methods both quantitatively and qualitatively. extit{ Source code in } href{https://github.com/ZHU-Zhiyu/NVS_Solver}{https://github.com/ZHU-Zhiyu/NVS$_$Solver}.
Problem

Research questions and friction points this paper is trying to address.

Leveraging video diffusion models for novel view synthesis without training
Adaptively modulating diffusion sampling with scene priors from input views
Achieving superior performance in static and dynamic scene view synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pre-trained video diffusion models
Modulates diffusion sampling adaptively
Theoretical error boundary exploration
🔎 Similar Papers
No similar papers found.
M
Meng You
Department of Computer Science, City University of Hong Kong, Hong Kong SAR
Zhiyu Zhu
Zhiyu Zhu
Shanxi University
H
Hui Liu
School of Computing and Information Sciences, Saint Francis University, Hong Kong SAR
Junhui Hou
Junhui Hou
Department of Computer Science, City University of Hong Kong
Neural Spatial Computing