MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a four-dimensional reconstruction framework based on video diffusion models to jointly recover dense 3D geometry and scene motion from monocular videos. The method introduces a unified representation of dense 3D point maps and 3D scene flow within a shared coordinate system and designs a novel 4D variational autoencoder (VAE) for end-to-end learning. A key innovation lies in abandoning the conventional strategy of enforcing alignment between RGB and 3D latent spaces; instead, it employs a normalization scheme and VAE training mechanism specifically tailored for 4D data, effectively transferring diffusion priors. Experiments demonstrate that the approach achieves state-of-the-art performance across multiple benchmarks, improving geometric reconstruction accuracy by 38.64% and motion estimation by 25.0%, all without requiring post-optimization.

Technology Category

Application Category

📝 Abstract
We introduce MotionCrafter, a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense motion from a monocular video. The core of our method is a novel joint representation of dense 3D point maps and 3D scene flows in a shared coordinate system, and a novel 4D VAE to effectively learn this representation. Unlike prior work that forces the 3D value and latents to align strictly with RGB VAE latents-despite their fundamentally different distributions-we show that such alignment is unnecessary and leads to suboptimal performance. Instead, we introduce a new data normalization and VAE training strategy that better transfers diffusion priors and greatly improves reconstruction quality. Extensive experiments across multiple datasets demonstrate that MotionCrafter achieves state-of-the-art performance in both geometry reconstruction and dense scene flow estimation, delivering 38.64% and 25.0% improvements in geometry and motion reconstruction, respectively, all without any post-optimization. Project page: https://ruijiezhu94.github.io/MotionCrafter_Page
Problem

Research questions and friction points this paper is trying to address.

4D reconstruction
dense motion estimation
monocular video
geometry reconstruction
scene flow
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D VAE
dense motion reconstruction
joint geometry-motion representation
diffusion-based video modeling
monocular 4D reconstruction
🔎 Similar Papers
No similar papers found.