AnyView: Synthesizing Any Novel View in Dynamic Scenes

📅 2026-01-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative video models struggle to maintain multi-view and spatiotemporal consistency in highly dynamic real-world scenes. This work proposes AnyView, a diffusion-based dynamic view synthesis framework that integrates monocular, multi-view static, and dynamic data to learn a general-purpose spatiotemporal implicit representation without strong geometric inductive biases. AnyView enables zero-shot generation of novel videos from arbitrary viewpoints, overcoming the limitations of conventional geometric assumptions and handling extreme viewpoint changes. To rigorously evaluate performance in such challenging scenarios, the authors introduce a new benchmark, AnyViewBench. Experiments demonstrate that AnyView achieves state-of-the-art results on standard benchmarks and significantly outperforms existing methods on AnyViewBench, producing videos that exhibit high realism, plausibility, and spatiotemporal coherence.

Technology Category

Application Category

📝 Abstract
Modern generative video models excel at producing convincing, high-quality outputs, but struggle to maintain multi-view and spatiotemporal consistency in highly dynamic real-world environments. In this work, we introduce \textbf{AnyView}, a diffusion-based video generation framework for \emph{dynamic view synthesis} with minimal inductive biases or geometric assumptions. We leverage multiple data sources with various levels of supervision, including monocular (2D), multi-view static (3D) and multi-view dynamic (4D) datasets, to train a generalist spatiotemporal implicit representation capable of producing zero-shot novel videos from arbitrary camera locations and trajectories. We evaluate AnyView on standard benchmarks, showing competitive results with the current state of the art, and propose \textbf{AnyViewBench}, a challenging new benchmark tailored towards \emph{extreme} dynamic view synthesis in diverse real-world scenarios. In this more dramatic setting, we find that most baselines drastically degrade in performance, as they require significant overlap between viewpoints, while AnyView maintains the ability to produce realistic, plausible, and spatiotemporally consistent videos when prompted from \emph{any} viewpoint. Results, data, code, and models can be viewed at: https://tri-ml.github.io/AnyView/
Problem

Research questions and friction points this paper is trying to address.

dynamic view synthesis
spatiotemporal consistency
novel view synthesis
multi-view consistency
video generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic view synthesis
diffusion-based video generation
spatiotemporal consistency
zero-shot novel view synthesis
implicit neural representation
🔎 Similar Papers
No similar papers found.