AnyView: Synthesizing Any Novel View in Dynamic Scenes

📅 2026-01-23

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing generative video models struggle to maintain multi-view and spatiotemporal consistency in highly dynamic real-world scenes. This work proposes AnyView, a diffusion-based dynamic view synthesis framework that integrates monocular, multi-view static, and dynamic data to learn a general-purpose spatiotemporal implicit representation without strong geometric inductive biases. AnyView enables zero-shot generation of novel videos from arbitrary viewpoints, overcoming the limitations of conventional geometric assumptions and handling extreme viewpoint changes. To rigorously evaluate performance in such challenging scenarios, the authors introduce a new benchmark, AnyViewBench. Experiments demonstrate that AnyView achieves state-of-the-art results on standard benchmarks and significantly outperforms existing methods on AnyViewBench, producing videos that exhibit high realism, plausibility, and spatiotemporal coherence.

Technology Category

Application Category

📝 Abstract

Modern generative video models excel at producing convincing, high-quality outputs, but struggle to maintain multi-view and spatiotemporal consistency in highly dynamic real-world environments. In this work, we introduce \textbf{AnyView}, a diffusion-based video generation framework for \emph{dynamic view synthesis} with minimal inductive biases or geometric assumptions. We leverage multiple data sources with various levels of supervision, including monocular (2D), multi-view static (3D) and multi-view dynamic (4D) datasets, to train a generalist spatiotemporal implicit representation capable of producing zero-shot novel videos from arbitrary camera locations and trajectories. We evaluate AnyView on standard benchmarks, showing competitive results with the current state of the art, and propose \textbf{AnyViewBench}, a challenging new benchmark tailored towards \emph{extreme} dynamic view synthesis in diverse real-world scenarios. In this more dramatic setting, we find that most baselines drastically degrade in performance, as they require significant overlap between viewpoints, while AnyView maintains the ability to produce realistic, plausible, and spatiotemporally consistent videos when prompted from \emph{any} viewpoint. Results, data, code, and models can be viewed at: https://tri-ml.github.io/AnyView/

Problem

Research questions and friction points this paper is trying to address.

dynamic view synthesis

spatiotemporal consistency

novel view synthesis

multi-view consistency

video generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic view synthesis

diffusion-based video generation

spatiotemporal consistency