🤖 AI Summary
Monocular video-based reconstruction of biomechanically plausible 3D human motion—encompassing both kinematics and dynamics—faces dual challenges: oversimplified biomechanical modeling and the absence of physical constraints. To address this, we propose the first physics-constrained inverse-forward cyclic framework that jointly estimates anatomically accurate musculoskeletal model parameters, enabling concurrent optimization of kinematics and dynamics. Our method integrates a Transformer encoder, differentiable forward kinematics/dynamics layers, and ODE-based physical simulation, and introduces a forward-inverse consistency loss. This enables, for the first time, physically credible dynamic estimation from monocular video alone. Evaluated on BML-MoVi, BEDLAM, and OpenCap, our approach significantly outperforms state-of-the-art methods, achieving high-fidelity kinematic reconstruction while delivering the first robust monocular dynamic recovery. It markedly improves biomechanical plausibility and motion causality.
📝 Abstract
Reconstructing biomechanically realistic 3D human motion - recovering both kinematics (motion) and kinetics (forces) - is a critical challenge. While marker-based systems are lab-bound and slow, popular monocular methods use oversimplified, anatomically inaccurate models (e.g., SMPL) and ignore physics, fundamentally limiting their biomechanical fidelity. In this work, we introduce MonoMSK, a hybrid framework that bridges data-driven learning and physics-based simulation for biomechanically realistic 3D human motion estimation from monocular video. MonoMSK jointly recovers both kinematics (motions) and kinetics (forces and torques) through an anatomically accurate musculoskeletal model. By integrating transformer-based inverse dynamics with differentiable forward kinematics and dynamics layers governed by ODE-based simulation, MonoMSK establishes a physics-regulated inverse-forward loop that enforces biomechanical causality and physical plausibility. A novel forward-inverse consistency loss further aligns motion reconstruction with the underlying kinetic reasoning. Experiments on BML-MoVi, BEDLAM, and OpenCap show that MonoMSK significantly outperforms state-of-the-art methods in kinematic accuracy, while for the first time enabling precise monocular kinetics estimation.