🤖 AI Summary
To address the accuracy bottleneck of monocular video-based 3D human pose estimation in biomechanics applications, this paper proposes a markerless, anatomically plausible end-to-end framework. Methodologically, it introduces a novel tripartite architecture comprising Multi-Query Human Mesh Recovery (MQ-HMR), Neural Inverse Kinematics (NeurIK), and 2D-guided iterative optimization. The framework integrates multi-query deformable Transformers, spatiotemporal graph neural networks, and vertex-level virtual marker modeling, explicitly enforcing anatomical constraints and 2D–3D geometric consistency. Compared to state-of-the-art methods, our approach reduces 3D joint error by 18.7% on mainstream benchmarks. It achieves, for the first time in monocular video settings, clinically usable biomechanical-level accuracy—overcoming the anatomical distortions inherent in conventional parametric models—while requiring neither motion-capture hardware nor manual annotations.
📝 Abstract
Recent advancements in 3D human pose estimation from single-camera images and videos have relied on parametric models, like SMPL. However, these models oversimplify anatomical structures, limiting their accuracy in capturing true joint locations and movements, which reduces their applicability in biomechanics, healthcare, and robotics. Biomechanically accurate pose estimation, on the other hand, typically requires costly marker-based motion capture systems and optimization techniques in specialized labs. To bridge this gap, we propose BioPose, a novel learning-based framework for predicting biomechanically accurate 3D human pose directly from monocular videos. BioPose includes three key components: a Multi-Query Human Mesh Recovery model (MQ-HMR), a Neural Inverse Kinematics (NeurIK) model, and a 2D-informed pose refinement technique. MQ-HMR leverages a multi-query deformable transformer to extract multi-scale fine-grained image features, enabling precise human mesh recovery. NeurIK treats the mesh vertices as virtual markers, applying a spatial-temporal network to regress biomechanically accurate 3D poses under anatomical constraints. To further improve 3D pose estimations, a 2D-informed refinement step optimizes the query tokens during inference by aligning the 3D structure with 2D pose observations. Experiments on benchmark datasets demonstrate that BioPose significantly outperforms state-of-the-art methods. Project website: url{https://m-usamasaleem.github.io/publication/BioPose/BioPose.html}.