DanceHMR: Hand-Aware Whole-Body Human Mesh Recovery from Monocular Videos

πŸ“… 2026-05-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

215K/year
πŸ€– AI Summary
This work addresses the challenge of achieving both temporal coherence in full-body motion and high-fidelity hand detail in monocular video-based human mesh reconstruction. The authors propose a temporally consistent full-body reconstruction framework that, for the first time, jointly optimizes SMPL-X body and hand motions within a unified architecture. By introducing a residual-based body–hand feature fusion mechanism, the method simultaneously preserves stable body dynamics and recovers fine-grained hand poses. Additionally, a close-up-aware data augmentation strategy is incorporated to enhance robustness under upper-body-centric framing. Experiments demonstrate that the approach achieves superior hand accuracy while maintaining competitive body pose precision on both full-body and body-only benchmarks, producing temporally stable SMPL-X sequences that align well with 2D observations in complex real-world scenarios.
πŸ“ Abstract
Monocular video human mesh recovery is essential for digital humans, avatar animation, and embodied simulation, where both temporal stability and expressive whole-body motion are required. Existing video HMR methods produce coherent body motion but often overlook detailed hand articulation, while image-based whole-body methods recover SMPL-X meshes independently per frame, often leading to jittery and inaccurate hand motion. We present a temporally coherent whole-body HMR framework for challenging in-the-wild monocular videos. Our model unifies body context and part-specific hand observations through residual body-hand fusion, enabling stable body motion and detailed hand recovery within a single temporal architecture. We further introduce close-up-aware augmentation to improve robustness under upper-body framing. Experiments on whole-body and body-only benchmarks demonstrate improved hand reconstruction and competitive body accuracy. Our method also produces temporally stable and 2D-consistent SMPL-X motion in challenging real-world videos.
Problem

Research questions and friction points this paper is trying to address.

human mesh recovery
monocular video
whole-body motion
hand articulation
temporal coherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

temporal coherence
hand-aware mesh recovery
residual body-hand fusion
monocular video
SMPL-X