🤖 AI Summary
Current self-supervised learning methods for medical imaging neglect the anatomical consistency, coherence, and hierarchical structure of the human body, leading to suboptimal anatomical representation learning. To address this, we propose the first framework that systematically encodes three fundamental anatomical priors—consistency, coherence, and hierarchy—as multi-view self-supervised signals, enabling anatomy-aware foundation model training on large-scale chest X-ray data. Our method integrates anatomy-guided data augmentation, multi-view contrastive learning, and hierarchical feature alignment to explicitly enforce anatomical constraints during representation learning. Evaluated on ten diverse downstream clinical tasks, our approach consistently outperforms ten state-of-the-art baseline models; after fine-tuning, it achieves an average AUC improvement of 3.2%. Moreover, the learned representations exhibit enhanced generalizability, robustness to distribution shifts, and improved clinical interpretability—validated through attention visualization and expert evaluation.
📝 Abstract
Foundation models have been successful in natural language processing and computer vision because they are capable of capturing the underlying structures (foundation) of natural languages. However, in medical imaging, the key foundation lies in human anatomy, as these images directly represent the internal structures of the body, reflecting the consistency, coherence, and hierarchy of human anatomy. Yet, existing self-supervised learning (SSL) methods often overlook these perspectives, limiting their ability to effectively learn anatomical features. To overcome the limitation, we built Lamps (learning anatomy from multiple perspectives via self-supervision) pre-trained on large-scale chest radiographs by harmoniously utilizing the consistency, coherence, and hierarchy of human anatomy as the supervision signal. Extensive experiments across 10 datasets evaluated through fine-tuning and emergent property analysis demonstrate Lamps' superior robustness, transferability, and clinical potential when compared to 10 baseline models. By learning from multiple perspectives, Lamps presents a unique opportunity for foundation models to develop meaningful, robust representations that are aligned with the structure of human anatomy.