Full-4D: Generating Full-Scope 4D Scenes from a Single-View Video

πŸ“… 2026-05-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of generating complete dynamic 4D scenes from a single-view video, which suffers from severe information deficiency and view consistency issues. The authors propose an end-to-end framework that first synthesizes multi-view synchronized videos using a diffusion model and then reconstructs an explicit dynamic scene via 4D Gaussian Splatting (4DGS). Key contributions include the introduction of Real-MV-4D, a large-scale multi-view 4D dataset; a spatio-temporal–view fusion attention mechanism that incorporates geometric priors; and a flow-matching distillation loss to enhance novel-view rendering consistency. Experimental results demonstrate that the proposed method significantly outperforms existing approaches in both visual fidelity and geometric consistency, achieving high-quality full-view dynamic 4D scene generation for the first time.
πŸ“ Abstract
Generating 4D scenes from a single-view video is inherently ill-posed: a single viewpoint lacks the information needed to recover a complete, dynamic scene with full coverage. Existing methods are typically limited to monocular videos, simple 3D effects, or only small viewpoint perturbations around the original viewpoint, falling short of true 4D generation. Meanwhile, the lack of large-scale datasets capturing full-scope 4D scenes with synchronized multi-view videos further hinders progress in this direction. We propose a novel single-view video-to-4D framework that casts full-scope 4D generation as a multi-view video synthesis followed by optimization-based 4D reconstruction from the generated views. To instantiate this formulation end-to-end, we make three key contributions. First, we introduce Real-MV-4D, a large-scale dataset of synchronized multi-view videos captured in diverse real-world environments to provide the 4D supervision. Second, we train a multi-view video diffusion model driven by a novel fused time(T)-view(V) attention mechanism that directly embeds geometric reprojection priors and explicit camera conditioning into its view-time interactions. Unlike basic feature fusion, this direct binding strictly aligns the generation process with physical 3D priors to produce a dense, synchronized T$\times $V video grid. Third, rather than relying on non-interactive and inconsistent 2D video interpolations, we lift the synthesized multi-view videos into an explicit 4D representation (i.e. 4DGS), regularized by a Flow Matching Distillation loss that exploits the multi-view prior to improve novel-view rendering. Extensive experiments demonstrate that our method outperforms existing approaches in both visual fidelity and geometric consistency, enabling full-scope 4D scene generation from single-view videos.
Problem

Research questions and friction points this paper is trying to address.

4D scene generation
single-view video
multi-view synthesis
ill-posed problem
full-scope reconstruction
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D scene generation
multi-view video diffusion
time-view attention
4D Gaussian Splatting
Real-MV-4D dataset
T
Tingxi Chen
Shanghai Jiao Tong University, Institute of Artificial Intelligence, China Telecom (TeleAI)
K
Ke Hao
Shanghai Jiao Tong University, Institute of Artificial Intelligence, China Telecom (TeleAI)
Yabo Chen
Yabo Chen
Shanghai Jiaotong University
Self-supervised Learning
Zhengxue Cheng
Zhengxue Cheng
Assistant Researcher, Shanghai Jiao Tong University
Video and Image CodingComputer VisionImage Quality Assessment
R
Rong Xie
Shanghai Jiao Tong University
Li Song
Li Song
Professor of Electronic Engineering, Shanghai Jiao Tong University
Video CodingImage ProcessingComputer Vision
Haibin Huang
Haibin Huang
Principal Research Scientist at TeleAI
Computer GraphicsComputer VisionGeometric Modeling3D Deep Learning
C
Chi Zhang
Institute of Artificial Intelligence, China Telecom (TeleAI)
X
Xuelong Li
Institute of Artificial Intelligence, China Telecom (TeleAI)