🤖 AI Summary
This study addresses the significant amplification of noise in 3D human pose estimates from monocular video when computing joint torques via inverse dynamics, particularly affecting proximal joints. The work presents the first quantitative analysis of this noise amplification effect—demonstrating an approximate 1000-fold increase—and introduces SMPL-Dynamics, a fully differentiable inverse dynamics module built upon the SMPL body model that operates without external physics simulators. By integrating low-pass filtering and differentiable pose refinement prior to differentiation, the proposed method reduces joint torque error by 93% while preserving pose estimation accuracy, thereby substantially enhancing the robustness and fidelity of torque estimation.
📝 Abstract
Recent advances in monocular 3D human pose estimation enable accurate body tracking from video. However, translating these kinematic estimates into physical quantities, such as joint torques, remains challenging due to noise amplification through inverse dynamics. In this work, we provide a systematic analysis of how pose estimation noise propagates through the inverse dynamics pipeline. We present three key findings: (1) pose noise is amplified by approximately 1,000x when computing joint torques via numerical differentiation, (2) proximal joints (spine, hips) are up to 10x more sensitive to noise than distal joints (wrists, hands), and (3) low-pass filtering before differentiation substantially reduces this amplification. To enable this analysis, we develop SMPL-Dynamics, a fully differentiable inverse dynamics module for the SMPL body model that requires no external physics simulators. Our module supports end-to-end gradient computation, and we demonstrate this through differentiable pose refinement, which reduces torque error by 93% with negligible change in pose.