Natural Human Motion Recovery by Aligning High-Order Temporal Dynamics from Monocular Videos

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This work addresses the common issue in monocular video-based human motion recovery—over-smoothing and kinematic inconsistency due to the lack of high-order temporal dynamics such as velocity and acceleration. To this end, the authors propose HTD-Refine, a post-processing framework that explicitly incorporates high-order temporal dynamics to enhance motion plausibility. At its core, PVA-Net leverages a temporal Transformer to jointly predict 2D joint positions, 3D velocities, and accelerations from monocular video, which are then integrated as soft constraints into a physics-inspired global trajectory optimization. The entire pipeline is end-to-end trainable and consistently outperforms existing methods across multiple in-the-wild benchmarks, effectively mitigating over-smoothing and jitter while recovering more accurate global trajectories and natural dynamic motion.

📝 Abstract

Human motion recovered from monocular videos often appears overly smooth or dynamically inconsistent, even when joint positions are numerically accurate. We observe that this limitation stems from the absence of reliable high-order temporal cues -- velocity and acceleration -- which are essential for reconstructing motion that exhibits realistic momentum, timing, and high-frequency detail. We introduce HTD-Refine, a post-processing framework that augments existing Human Motion Recovery (HMR) pipelines using explicitly estimated high-order temporal dynamics. At the core of our system is PVA-Net, a temporal transformer that infers per-joint 2D positions, 3D velocities, and 3D accelerations directly from a monocular video. These predicted dynamics serve as soft yet informative constraints in a global optimization procedure that refines world-space trajectories, significantly reducing jitter, suppressing over-smoothing, and restoring physically plausible motion. Extensive experiments on challenging in-the-wild benchmarks show that HTD-Refine consistently improves state-of-the-art HMR methods, yielding more accurate global trajectories and substantially more natural motion dynamics. Our results highlight the critical role of high-order temporal modeling in advancing monocular human motion recovery.

Problem

Research questions and friction points this paper is trying to address.

Human Motion Recovery

Monocular Videos

Temporal Dynamics

Motion Smoothness

Dynamic Consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

high-order temporal dynamics

monocular human motion recovery

temporal transformer