🤖 AI Summary
To address the low accuracy and poor robustness of 3D human pose estimation in dance videos—caused by high-dynamic motion, frequent occlusions, and stylized choreography—this work introduces the first end-to-end 3D pose estimation pipeline tailored for dance archival footage. Methodologically, it systematically integrates state-of-the-art monocular 3D pose estimators (e.g., VideoPose3D, PoseFormer) with a customized post-processing module incorporating motion continuity constraints and occlusion-aware reweighting, coupled with an interpretable visualization toolkit. Extensive experiments on a large-scale, multi-genre archival dataset—including ballet, modern dance, and folk dance—reveal that clothing complexity, camera viewpoint, and motion amplitude significantly impact estimation error (increasing mean per-joint position error [MPJPE] by 12.7% on average); our approach reduces MPJPE by 18.3% over baseline models. The code, annotated dataset, and evaluation benchmark are publicly released.
📝 Abstract
The accuracy and efficiency of human body pose estimation depend on the quality of the data to be processed and of the particularities of these data. To demonstrate how dance videos can challenge pose estimation techniques, we proposed a new 3D human body pose estimation pipeline which combined up-to-date techniques and methods that had not been yet used in dance analysis. Second, we performed tests and extensive experimentations from dance video archives, and used visual analytic tools to evaluate the impact of several data parameters on human body pose. Our results are publicly available for research at https://www.couleur.org/articles/arXiv-1-2025/