Dynamic View Synthesis as an Inverse Problem

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses dynamic novel-view synthesis from monocular video without training. Methodologically, it formulates the task as an inverse problem solved via a pre-trained video diffusion model, with two key innovations: (i) a K-order recursive noise representation that overcomes deterministic inversion failure caused by zero-terminal signal-to-noise ratio in standard denoising processes; and (ii) a stochastic latent-space modulation mechanism enabling visibility-aware occlusion completion. The technical pipeline integrates DDIM inversion, VAE latent-code alignment, recursive noise modeling, and stochastic latent sampling. Evaluated on multiple dynamic scenes, the approach achieves high-fidelity, artifact-free novel-view synthesis—without fine-tuning, auxiliary networks, or additional supervision. It significantly improves both reconstruction quality and generalization capability for monocular-video-driven dynamic view synthesis.

Technology Category

Application Category

📝 Abstract

In this work, we address dynamic view synthesis from monocular videos as an inverse problem in a training-free setting. By redesigning the noise initialization phase of a pre-trained video diffusion model, we enable high-fidelity dynamic view synthesis without any weight updates or auxiliary modules. We begin by identifying a fundamental obstacle to deterministic inversion arising from zero-terminal signal-to-noise ratio (SNR) schedules and resolve it by introducing a novel noise representation, termed K-order Recursive Noise Representation. We derive a closed form expression for this representation, enabling precise and efficient alignment between the VAE-encoded and the DDIM inverted latents. To synthesize newly visible regions resulting from camera motion, we introduce Stochastic Latent Modulation, which performs visibility aware sampling over the latent space to complete occluded regions. Comprehensive experiments demonstrate that dynamic view synthesis can be effectively performed through structured latent manipulation in the noise initialization phase.

Problem

Research questions and friction points this paper is trying to address.

Dynamic view synthesis from monocular videos as inverse problem

High-fidelity synthesis without weight updates or auxiliary modules

Structured latent manipulation for visibility-aware region completion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Redesign noise initialization in video diffusion

Introduce K-order Recursive Noise Representation

Use Stochastic Latent Modulation for visibility

🔎 Similar Papers

No similar papers found.

TikTok

San Jose, California

Sr. Research Engineer/Scientist (all levels), World Models

TikTok

San Jose, California

AI Research Scientist, Computer Vision - Facebook Video Intelligence