DanceHMR: Hand-Aware Whole-Body Human Mesh Recovery from Monocular Videos

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the challenge of achieving both temporal coherence in full-body motion and high-fidelity hand detail in monocular video-based human mesh reconstruction. The authors propose a temporally consistent full-body reconstruction framework that, for the first time, jointly optimizes SMPL-X body and hand motions within a unified architecture. By introducing a residual-based body–hand feature fusion mechanism, the method simultaneously preserves stable body dynamics and recovers fine-grained hand poses. Additionally, a close-up-aware data augmentation strategy is incorporated to enhance robustness under upper-body-centric framing. Experiments demonstrate that the approach achieves superior hand accuracy while maintaining competitive body pose precision on both full-body and body-only benchmarks, producing temporally stable SMPL-X sequences that align well with 2D observations in complex real-world scenarios.

📝 Abstract

Monocular video human mesh recovery is essential for digital humans, avatar animation, and embodied simulation, where both temporal stability and expressive whole-body motion are required. Existing video HMR methods produce coherent body motion but often overlook detailed hand articulation, while image-based whole-body methods recover SMPL-X meshes independently per frame, often leading to jittery and inaccurate hand motion. We present a temporally coherent whole-body HMR framework for challenging in-the-wild monocular videos. Our model unifies body context and part-specific hand observations through residual body-hand fusion, enabling stable body motion and detailed hand recovery within a single temporal architecture. We further introduce close-up-aware augmentation to improve robustness under upper-body framing. Experiments on whole-body and body-only benchmarks demonstrate improved hand reconstruction and competitive body accuracy. Our method also produces temporally stable and 2D-consistent SMPL-X motion in challenging real-world videos.

Problem

Research questions and friction points this paper is trying to address.

human mesh recovery

monocular video

whole-body motion

hand articulation

temporal coherence

Innovation

Methods, ideas, or system contributions that make the work stand out.

temporal coherence

hand-aware mesh recovery

residual body-hand fusion

monocular video

SMPL-X

🔎 Similar Papers

DiffMesh: A Motion-aware Diffusion Framework for Human Mesh Recovery from Videos

2023-03-23Citations: 0