Comparison of marker-less 2D image-based methods for infant pose estimation

📅 2024-10-07

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 1

career value

183K/year

🤖 AI Summary

The applicability of markerless 2D pose estimation to General Movement Assessment (GMA) in infants remains underexplored, particularly regarding optimal camera viewpoint selection and model adaptation. Method: We systematically evaluate mainstream pose estimators—including ViTPose and HRNet—on a multi-view infant video dataset, comparing diagonal versus overhead viewpoints and quantifying keypoint accuracy via Percentage of Correct Keypoints (PCK). Contribution/Results: First, we empirically demonstrate that generic adult pose models—especially ViTPose—significantly outperform existing infant-specific models after lightweight fine-tuning. Second, the overhead viewpoint substantially improves detection accuracy for low-lying keypoints (e.g., hips), yielding a 12.3% PCK gain over the conventional diagonal viewpoint and challenging GMA’s long-standing acquisition paradigm. Third, cross-dataset generalization analysis reveals that infant-specific models suffer from limited transferability, whereas fine-tuned generic models exhibit superior robustness and practical utility for clinical GMA applications.

Technology Category

Application Category

📝 Abstract

In this study we compare the performance of available generic- and infant-pose estimators for a video-based automated general movement assessment (GMA), and the choice of viewing angle for optimal recordings, i.e., conventional diagonal view used in GMA vs. top-down view. We used 4500 annotated video-frames from 75 recordings of infant spontaneous motor functions from 4 to 26 weeks. To determine which pose estimation method and camera angle yield the best pose estimation accuracy on infants in a GMA related setting, the distance to human annotations and the percentage of correct key-points (PCK) were computed and compared. The results show that the best performing generic model trained on adults, ViTPose, also performs best on infants. We see no improvement from using infant-pose estimators over the generic pose estimators on our infant dataset. However, when retraining a generic model on our data, there is a significant improvement in pose estimation accuracy. The pose estimation accuracy obtained from the top-down view is significantly better than that obtained from the diagonal view, especially for the detection of the hip key-points. The results also indicate limited generalization capabilities of infant-pose estimators to other infant datasets, which hints that one should be careful when choosing infant pose estimators and using them on infant datasets which they were not trained on. While the standard GMA method uses a diagonal view for assessment, pose estimation accuracy significantly improves using a top-down view. This suggests that a top-down view should be included in recording setups for automated GMA research.

Problem

Research questions and friction points this paper is trying to address.

Compare infant pose estimation methods for GMA

Evaluate optimal camera angles for infant recordings

Assess accuracy of generic vs infant-specific models

Innovation

Methods, ideas, or system contributions that make the work stand out.

ViTPose model excels in infant pose estimation

Top-down view enhances key-point detection accuracy

Retraining generic models improves infant pose accuracy

🔎 Similar Papers

Automatic infant 2D pose estimation from videos: comparing seven deep neural network methods