4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time

📅 2025-06-23

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the problem of reconstructing dynamic 4D (3D spatial + temporal) object representations from sparse spatiotemporal views (i.e., few camera viewpoints and limited time frames). We propose the first unified 4D Gaussian primitive representation, jointly modeling spatiotemporal geometry and appearance as differentiable, renderable explicit primitives. To enable strong generalization across objects, cameras, and time steps, we design a large-scale spatiotemporal autoregressive pretraining framework. Our method directly regresses pixel-level 4D Gaussian parameters from pose-conditioned image tokens and employs efficient neural rendering to synthesize high-fidelity 24-frame sequences in a single forward pass. On a single A100 GPU, inference takes under 1.5 seconds, supporting arbitrary-view novel view synthesis and sub-frame temporal interpolation. This work is the first to demonstrate the feasibility and superiority of large-scale 4D pretraining for dynamic scene reconstruction.

Technology Category

Application Category

📝 Abstract

Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at some times to any view at any time? We provide an affirmative answer with 4D-LRM, the first large-scale 4D reconstruction model that takes input from unconstrained views and timestamps and renders arbitrary novel view-time combinations. Unlike prior 4D approaches, e.g., optimization-based, geometry-based, or generative, that struggle with efficiency, generalization, or faithfulness, 4D-LRM learns a unified space-time representation and directly predicts per-pixel 4D Gaussian primitives from posed image tokens across time, enabling fast, high-quality rendering at, in principle, infinite frame rate. Our results demonstrate that scaling spatiotemporal pretraining enables accurate and efficient 4D reconstruction. We show that 4D-LRM generalizes to novel objects, interpolates across time, and handles diverse camera setups. It reconstructs 24-frame sequences in one forward pass with less than 1.5 seconds on a single A100 GPU.

Problem

Research questions and friction points this paper is trying to address.

Reconstruct 4D objects from limited views and times

Render novel views and times efficiently and accurately

Generalize across diverse objects and camera setups

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified space-time representation learning

Predicts 4D Gaussian primitives directly

Fast high-quality infinite frame rendering

🔎 Similar Papers

Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View