DuoMo: Dual Motion Diffusion for World-Space Human Reconstruction

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of recovering globally consistent and noise-robust human motion in world coordinates from unconstrained videos with incomplete or noisy observations. The authors propose a two-stage diffusion framework: the first stage estimates human motion in camera coordinates, and the second stage elevates and refines this estimate into a globally consistent world-coordinate trajectory by directly generating 3D mesh vertex trajectories—bypassing reliance on parametric body models. By decoupling modeling in camera space and world space, the method significantly enhances robustness to complex scenes and missing input data. Evaluated on the EMDB and RICH datasets, the approach reduces world-space reconstruction error by 16% and 30%, respectively, while effectively mitigating foot sliding, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
We present DuoMo, a generative method that recovers human motion in world-space coordinates from unconstrained videos with noisy or incomplete observations. Reconstructing such motion requires solving a fundamental trade-off: generalizing from diverse and noisy video inputs while maintaining global motion consistency. Our approach addresses this problem by factorizing motion learning into two diffusion models. The camera-space model first estimates motion from videos in camera coordinates. The world-space model then lifts this initial estimate into world coordinates and refines it to be globally consistent. Together, the two models can reconstruct motion across diverse scenes and trajectories, even from highly noisy or incomplete observations. Moreover, our formulation is general, generating the motion of mesh vertices directly and bypassing parametric models. DuoMo achieves state-of-the-art performance. On EMDB, our method obtains a 16% reduction in world-space reconstruction error while maintaining low foot skating. On RICH, it obtains a 30% reduction in world-space error. Project page: https://yufu-wang.github.io/duomo/
Problem

Research questions and friction points this paper is trying to address.

world-space human reconstruction
motion consistency
noisy observations
unconstrained videos
human motion recovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual Diffusion
World-Space Motion Reconstruction
Camera-to-World Lifting
Mesh-Based Motion Generation
Noise-Robust Human Motion