NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer

📅 2024-05-24

🏛️ arXiv.org

📈 Citations: 19

✨ Influential: 3

career value

205K/year

🤖 AI Summary

This paper introduces a novel zero-shot novel view synthesis (NVS) paradigm that generates high-fidelity novel views from single/multi-view static scenes or monocular dynamic videos—without fine-tuning pre-trained video diffusion models. To address the challenge of geometry-aware synthesis under zero-shot conditions, the method (1) pioneers the adaptation of video diffusion models for zero-shot NVS; (2) proposes an iterative score function modulation mechanism grounded in scene geometric priors, augmented by optical-flow-guided view deformation; and (3) theoretically derives and implements pose- and sampling-step-adaptive error-bound control to stabilize generation. Extensive experiments demonstrate state-of-the-art performance on both static and dynamic benchmarks, with significant improvements in quantitative metrics (e.g., PSNR, SSIM, LPIPS) and qualitative fidelity. The framework supports diverse input modalities—including single image, multi-view images, and monocular video—and the code is publicly released.

Technology Category

Application Category

📝 Abstract

By harnessing the potent generative capabilities of pre-trained large video diffusion models, we propose NVS-Solver, a new novel view synthesis (NVS) paradigm that operates extit{without} the need for training. NVS-Solver adaptively modulates the diffusion sampling process with the given views to enable the creation of remarkable visual experiences from single or multiple views of static scenes or monocular videos of dynamic scenes. Specifically, built upon our theoretical modeling, we iteratively modulate the score function with the given scene priors represented with warped input views to control the video diffusion process. Moreover, by theoretically exploring the boundary of the estimation error, we achieve the modulation in an adaptive fashion according to the view pose and the number of diffusion steps. Extensive evaluations on both static and dynamic scenes substantiate the significant superiority of our NVS-Solver over state-of-the-art methods both quantitatively and qualitatively. extit{ Source code in } href{https://github.com/ZHU-Zhiyu/NVS_Solver}{https://github.com/ZHU-Zhiyu/NVS$_$Solver}.

Problem

Research questions and friction points this paper is trying to address.

Leveraging video diffusion models for novel view synthesis without training

Adaptively modulating diffusion sampling with scene priors from input views

Achieving superior performance in static and dynamic scene view synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pre-trained video diffusion models

Modulates diffusion sampling adaptively

Theoretical error boundary exploration

🔎 Similar Papers

No similar papers found.