🤖 AI Summary
This work addresses the challenge of efficiently reconstructing full 3D human pose—including joint 3D positions and skeletal rotations—from monocular 2D images for sports analytics. We propose the first end-to-end differentiable 2D-to-3D pose lifting framework that eliminates computationally expensive inverse kinematics (IK). Our method performs full 3D pose estimation via a single forward pass of a neural network. We systematically evaluate rotation representations—6D, quaternion, and axis-angle—along with their corresponding loss functions, and support flexible training with or without ground-truth rotation labels. Experiments demonstrate state-of-the-art performance in joint rotation estimation, superior 3D joint localization accuracy compared to HMR, and a 150× speedup in inference over IK-based methods. This work achieves simultaneous advances in accuracy, efficiency, and modeling completeness for monocular 3D human pose estimation.
📝 Abstract
In sports analytics, accurately capturing both the 3D locations and rotations of body joints is essential for understanding an athlete's biomechanics. While Human Mesh Recovery (HMR) models can estimate joint rotations, they often exhibit lower accuracy in joint localization compared to 3D Human Pose Estimation (HPE) models. Recent work addressed this limitation by combining a 3D HPE model with inverse kinematics (IK) to estimate both joint locations and rotations. However, IK is computationally expensive. To overcome this, we propose a novel 2D-to-3D uplifting model that directly estimates 3D human poses, including joint rotations, in a single forward pass. We investigate multiple rotation representations, loss functions, and training strategies - both with and without access to ground truth rotations. Our models achieve state-of-the-art accuracy in rotation estimation, are 150 times faster than the IK-based approach, and surpass HMR models in joint localization precision.