WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing video generation methods suffer from fundamental limitations in 3D/4D consistency, failing to simultaneously ensure geometric stability and dynamic plausibility. This work introduces the first end-to-end 4D-consistent video generation framework, which explicitly models joint spatiotemporal representations—including RGB frames, point clouds, camera trajectories, and dense optical flow—to achieve stable cross-view and temporal scene reconstruction. Our approach synergistically integrates synthetic data—providing precise geometric, motion, and camera pose supervision—with real-world videos—to enhance visual diversity and photorealism—via multimodal joint representation learning and explicit consistency regularization. Experiments demonstrate significant improvements in geometric consistency and motion coherence under challenging conditions, including dynamic non-rigid scenes and large-scale camera motion. The method effectively suppresses view-time artifacts (e.g., flickering, ghosting, depth inconsistency) and achieves state-of-the-art performance on quantitative 4D consistency metrics.

Technology Category

Application Category

📝 Abstract
Recent video generators achieve striking photorealism, yet remain fundamentally inconsistent in 3D. We present WorldReel, a 4D video generator that is natively spatio-temporally consistent. WorldReel jointly produces RGB frames together with 4D scene representations, including pointmaps, camera trajectory, and dense flow mapping, enabling coherent geometry and appearance modeling over time. Our explicit 4D representation enforces a single underlying scene that persists across viewpoints and dynamic content, yielding videos that remain consistent even under large non-rigid motion and significant camera movement. We train WorldReel by carefully combining synthetic and real data: synthetic data providing precise 4D supervision (geometry, motion, and camera), while real videos contribute visual diversity and realism. This blend allows WorldReel to generalize to in-the-wild footage while preserving strong geometric fidelity. Extensive experiments demonstrate that WorldReel sets a new state-of-the-art for consistent video generation with dynamic scenes and moving cameras, improving metrics of geometric consistency, motion coherence, and reducing view-time artifacts over competing methods. We believe that WorldReel brings video generation closer to 4D-consistent world modeling, where agents can render, interact, and reason about scenes through a single and stable spatiotemporal representation.
Problem

Research questions and friction points this paper is trying to address.

Generates 4D videos with consistent geometry and motion
Enforces a single underlying scene across viewpoints and dynamics
Improves geometric consistency and reduces artifacts in dynamic scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates 4D videos with consistent geometry and motion
Uses explicit 4D scene representations like pointmaps and flow
Trains with a blend of synthetic and real data