HumanMM: Global Human Motion Recovery from Multi-shot Videos

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of discontinuous 3D human motion reconstruction in multi-camera outdoor videos, caused by shot transitions, dynamic backgrounds, and occlusions. We propose an end-to-end framework that integrates a custom shot transition detector, a cross-shot robust geometric alignment module, enhanced camera pose estimation, and an improved HMR backbone coupled with a motion integrator—collectively mitigating foot sliding and ensuring temporal consistency. To our knowledge, this is the first method achieving long-sequence, globally consistent, high-fidelity 3D reconstruction in world coordinates. Evaluated on a newly constructed multi-camera outdoor dataset, our approach significantly outperforms single-camera baselines: global trajectory error is reduced by 32.7%. The resulting spatiotemporally coherent reconstructions provide a reliable foundation for downstream generative and interpretive tasks in human motion analysis.

Technology Category

Application Category

📝 Abstract
In this paper, we present a novel framework designed to reconstruct long-sequence 3D human motion in the world coordinates from in-the-wild videos with multiple shot transitions. Such long-sequence in-the-wild motions are highly valuable to applications such as motion generation and motion understanding, but are of great challenge to be recovered due to abrupt shot transitions, partial occlusions, and dynamic backgrounds presented in such videos. Existing methods primarily focus on single-shot videos, where continuity is maintained within a single camera view, or simplify multi-shot alignment in camera space only. In this work, we tackle the challenges by integrating an enhanced camera pose estimation with Human Motion Recovery (HMR) by incorporating a shot transition detector and a robust alignment module for accurate pose and orientation continuity across shots. By leveraging a custom motion integrator, we effectively mitigate the problem of foot sliding and ensure temporal consistency in human pose. Extensive evaluations on our created multi-shot dataset from public 3D human datasets demonstrate the robustness of our method in reconstructing realistic human motion in world coordinates.
Problem

Research questions and friction points this paper is trying to address.

Reconstruct long-sequence 3D human motion from multi-shot videos.
Address challenges like abrupt shot transitions and occlusions.
Ensure accurate pose continuity and temporal consistency in motion.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced camera pose estimation with HMR
Shot transition detector and alignment module
Custom motion integrator for temporal consistency
🔎 Similar Papers
No similar papers found.