Learning Anatomy from Multiple Perspectives via Self-supervision in Chest Radiographs

📅 2025-12-28

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Current self-supervised learning methods for medical imaging neglect the anatomical consistency, coherence, and hierarchical structure of the human body, leading to suboptimal anatomical representation learning. To address this, we propose the first framework that systematically encodes three fundamental anatomical priors—consistency, coherence, and hierarchy—as multi-view self-supervised signals, enabling anatomy-aware foundation model training on large-scale chest X-ray data. Our method integrates anatomy-guided data augmentation, multi-view contrastive learning, and hierarchical feature alignment to explicitly enforce anatomical constraints during representation learning. Evaluated on ten diverse downstream clinical tasks, our approach consistently outperforms ten state-of-the-art baseline models; after fine-tuning, it achieves an average AUC improvement of 3.2%. Moreover, the learned representations exhibit enhanced generalizability, robustness to distribution shifts, and improved clinical interpretability—validated through attention visualization and expert evaluation.

Technology Category

Application Category

📝 Abstract

Foundation models have been successful in natural language processing and computer vision because they are capable of capturing the underlying structures (foundation) of natural languages. However, in medical imaging, the key foundation lies in human anatomy, as these images directly represent the internal structures of the body, reflecting the consistency, coherence, and hierarchy of human anatomy. Yet, existing self-supervised learning (SSL) methods often overlook these perspectives, limiting their ability to effectively learn anatomical features. To overcome the limitation, we built Lamps (learning anatomy from multiple perspectives via self-supervision) pre-trained on large-scale chest radiographs by harmoniously utilizing the consistency, coherence, and hierarchy of human anatomy as the supervision signal. Extensive experiments across 10 datasets evaluated through fine-tuning and emergent property analysis demonstrate Lamps' superior robustness, transferability, and clinical potential when compared to 10 baseline models. By learning from multiple perspectives, Lamps presents a unique opportunity for foundation models to develop meaningful, robust representations that are aligned with the structure of human anatomy.

Problem

Research questions and friction points this paper is trying to address.

Develops self-supervised learning to capture anatomical structures in chest X-rays

Addresses limitations of existing methods by using consistency, coherence, and hierarchy

Enhances model robustness and transferability for clinical applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning from anatomical consistency, coherence, hierarchy

Pre-training on large-scale chest radiographs using multi-perspective anatomy

Superior robustness and transferability demonstrated across 10 datasets

🔎 Similar Papers

Multi-modal vision-language model for generalizable annotation-free pathology localization and clinical diagnosis