MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work proposes a four-dimensional reconstruction framework based on video diffusion models to jointly recover dense 3D geometry and scene motion from monocular videos. The method introduces a unified representation of dense 3D point maps and 3D scene flow within a shared coordinate system and designs a novel 4D variational autoencoder (VAE) for end-to-end learning. A key innovation lies in abandoning the conventional strategy of enforcing alignment between RGB and 3D latent spaces; instead, it employs a normalization scheme and VAE training mechanism specifically tailored for 4D data, effectively transferring diffusion priors. Experiments demonstrate that the approach achieves state-of-the-art performance across multiple benchmarks, improving geometric reconstruction accuracy by 38.64% and motion estimation by 25.0%, all without requiring post-optimization.

Technology Category

Application Category

📝 Abstract

We introduce MotionCrafter, a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense motion from a monocular video. The core of our method is a novel joint representation of dense 3D point maps and 3D scene flows in a shared coordinate system, and a novel 4D VAE to effectively learn this representation. Unlike prior work that forces the 3D value and latents to align strictly with RGB VAE latents-despite their fundamentally different distributions-we show that such alignment is unnecessary and leads to suboptimal performance. Instead, we introduce a new data normalization and VAE training strategy that better transfers diffusion priors and greatly improves reconstruction quality. Extensive experiments across multiple datasets demonstrate that MotionCrafter achieves state-of-the-art performance in both geometry reconstruction and dense scene flow estimation, delivering 38.64% and 25.0% improvements in geometry and motion reconstruction, respectively, all without any post-optimization. Project page: https://ruijiezhu94.github.io/MotionCrafter_Page

Problem

Research questions and friction points this paper is trying to address.

4D reconstruction

dense motion estimation

monocular video

geometry reconstruction

scene flow

Innovation

Methods, ideas, or system contributions that make the work stand out.

4D VAE

dense motion reconstruction

joint geometry-motion representation