🤖 AI Summary
Robust human keypoint detection in low-resolution thermal imagery remains challenging, particularly for contactless rehabilitation assessment. Method: This paper introduces the Timed Up and Go (TUG) test into thermal vision analysis for the first time and proposes a lightweight pose estimation architecture fusing MobileNetV3-Small and ViTPose. A composite loss function—combining L2 and Object Keypoint Similarity (OKS)—is designed to jointly optimize latent-space alignment and heatmap accuracy. Transfer learning is employed on a custom thermal TUG dataset. Contribution/Results: The method achieves AP/ AP50/ AP75 = 0.861/0.942/0.887 under the OKS metric, significantly outperforming Mask R-CNN and ViTPose-Base. It reduces model parameters and FLOPS substantially while maintaining high accuracy, demonstrating strong potential for clinical deployment in resource-constrained settings.
📝 Abstract
This study presents a novel approach to human keypoint detection in low-resolution thermal images using transfer learning techniques. We introduce the first application of the Timed Up and Go (TUG) test in thermal image computer vision, establishing a new paradigm for mobility assessment. Our method leverages a MobileNetV3-Small encoder and a ViTPose decoder, trained using a composite loss function that balances latent representation alignment and heatmap accuracy. The model was evaluated using the Object Keypoint Similarity (OKS) metric from the COCO Keypoint Detection Challenge. The proposed model achieves better performance with AP, AP50, and AP75 scores of 0.861, 0.942, and 0.887 respectively, outperforming traditional supervised learning approaches like Mask R-CNN and ViTPose-Base. Moreover, our model demonstrates superior computational efficiency in terms of parameter count and FLOPS. This research lays a solid foundation for future clinical applications of thermal imaging in mobility assessment and rehabilitation monitoring.