SeeU: Seeing the Unseen World via 4D Dynamics-aware Generation

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the challenge of modeling continuous spatiotemporal dynamics from monocular 2D video—where conventional methods suffer from physical inconsistency in generated content due to inherent ambiguities in depth and motion. We propose the first 2D→4D→2D generative framework that explicitly models and reconstructs continuous 4D (3D spatial + temporal) dynamics from sparse single-frame inputs. Our approach represents the 4D scene as a low-rank tensor, jointly enforces kinematic and geometric physical constraints, and performs 4D scene inference and 2D reprojection via a spatiotemporal context-aware mechanism. Compared to 2D-centric baselines, our method significantly improves cross-temporal and cross-view consistency—as well as visual fidelity—in unseen-time synthesis, novel-view rendering, and video editing. It is the first to achieve physically interpretable, continuously differentiable 4D dynamic generation, establishing new state-of-the-art performance across multiple benchmarks.

Technology Category

Application Category

📝 Abstract

Images and videos are discrete 2D projections of the 4D world (3D space + time). Most visual understanding, prediction, and generation operate directly on 2D observations, leading to suboptimal performance. We propose SeeU, a novel approach that learns the continuous 4D dynamics and generate the unseen visual contents. The principle behind SeeU is a new 2D$ o$4D$ o$2D learning framework. SeeU first reconstructs the 4D world from sparse and monocular 2D frames (2D$ o$4D). It then learns the continuous 4D dynamics on a low-rank representation and physical constraints (discrete 4D$ o$continuous 4D). Finally, SeeU rolls the world forward in time, re-projects it back to 2D at sampled times and viewpoints, and generates unseen regions based on spatial-temporal context awareness (4D$ o$2D). By modeling dynamics in 4D, SeeU achieves continuous and physically-consistent novel visual generation, demonstrating strong potentials in multiple tasks including unseen temporal generation, unseen spatial generation, and video editing.

Problem

Research questions and friction points this paper is trying to address.

Generates unseen visual content via 4D dynamics

Learns continuous 4D world from sparse 2D frames

Achieves physically-consistent novel visual generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstructs 4D world from sparse monocular 2D frames

Learns continuous 4D dynamics with low-rank representation and constraints

Generates unseen visual content via 4D-to-2D reprojection and context awareness

🔎 Similar Papers

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency