MedSapiens: Taking a Pose to Rethink Medical Imaging Landmark Detection

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the longstanding reliance of anatomical landmark detection in medical imaging on domain-specific models, which impedes leveraging the representational power of large-scale vision foundation models. We propose MedSapiens—the first adaptation of the human pose estimation foundation model Sapiens to medical landmark detection. Our approach employs cross-dataset, multi-stage pretraining followed by few-shot fine-tuning to effectively transfer spatial-pose priors from natural imagery to the medical domain. We systematically demonstrate the feasibility and superiority of general-purpose pose models for this task, establishing a new strong baseline. MedSapiens achieves state-of-the-art performance across multiple medical imaging benchmarks, with an average success detection rate (SDR) improvement of 5.26% over generic vision models and 21.81% over the best prior domain-specific methods. Notably, it maintains significant gains—+2.69% SDR—even under few-shot settings, outperforming existing approaches.

Technology Category

Application Category

📝 Abstract

This paper does not introduce a novel architecture; instead, it revisits a fundamental yet overlooked baseline: adapting human-centric foundation models for anatomical landmark detection in medical imaging. While landmark detection has traditionally relied on domain-specific models, the emergence of large-scale pre-trained vision models presents new opportunities. In this study, we investigate the adaptation of Sapiens, a human-centric foundation model designed for pose estimation, to medical imaging through multi-dataset pretraining, establishing a new state of the art across multiple datasets. Our proposed model, MedSapiens, demonstrates that human-centric foundation models, inherently optimized for spatial pose localization, provide strong priors for anatomical landmark detection, yet this potential has remained largely untapped. We benchmark MedSapiens against existing state-of-the-art models, achieving up to 5.26% improvement over generalist models and up to 21.81% improvement over specialist models in the average success detection rate (SDR). To further assess MedSapiens adaptability to novel downstream tasks with few annotations, we evaluate its performance in limited-data settings, achieving 2.69% improvement over the few-shot state of the art in SDR. Code and model weights are available at https://github.com/xmed-lab/MedSapiens .

Problem

Research questions and friction points this paper is trying to address.

Adapting human-centric foundation models for anatomical landmark detection

Investigating Sapiens pose estimation model for medical imaging applications

Improving landmark detection accuracy across multiple medical datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapting human-centric pose estimation model to medical imaging

Using multi-dataset pretraining for anatomical landmark detection

Achieving state-of-the-art performance with limited annotations

🔎 Similar Papers

No similar papers found.