ModAn-MulSupCon: Modality-and Anatomy-Aware Multi-Label Supervised Contrastive Pretraining for Medical Imaging

📅 2025-08-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical imaging supervised pretraining is hindered by the scarcity of expert annotations, while readily available metadata—such as imaging modality and anatomical region—remain underutilized. To address this, we propose ModAn-MulSupCon, the first method to encode modality and anatomical information as multi-hot vectors and introduce a Jaccard-weighted multi-label contrastive loss for metadata-driven self-supervised representation learning. We pretrain a ResNet-18 backbone on the miniRIN dataset and evaluate transfer performance via linear probing and fine-tuning. On ACL tear detection, ModAn-MulSupCon achieves an AUC of 0.964; on thyroid nodule malignancy classification, it attains an AUC of 0.763—both significantly outperforming established baselines. These results demonstrate superior cross-task transferability and generalization under extreme label scarcity, validating the efficacy of leveraging structured metadata for self-supervised medical image representation learning.

Technology Category

Application Category

📝 Abstract
Background and objective: Expert annotations limit large-scale supervised pretraining in medical imaging, while ubiquitous metadata (modality, anatomical region) remain underused. We introduce ModAn-MulSupCon, a modality- and anatomy-aware multi-label supervised contrastive pretraining method that leverages such metadata to learn transferable representations. Method: Each image's modality and anatomy are encoded as a multi-hot vector. A ResNet-18 encoder is pretrained on a mini subset of RadImageNet (miniRIN, 16,222 images) with a Jaccard-weighted multi-label supervised contrastive loss, and then evaluated by fine-tuning and linear probing on three binary classification tasks--ACL tear (knee MRI), lesion malignancy (breast ultrasound), and nodule malignancy (thyroid ultrasound). Result: With fine-tuning, ModAn-MulSupCon achieved the best AUC on MRNet-ACL (0.964) and Thyroid (0.763), surpassing all baselines ($p<0.05$), and ranked second on Breast (0.926) behind SimCLR (0.940; not significant). With the encoder frozen, SimCLR/ImageNet were superior, indicating that ModAn-MulSupCon representations benefit most from task adaptation rather than linear separability. Conclusion: Encoding readily available modality/anatomy metadata as multi-label targets provides a practical, scalable pretraining signal that improves downstream accuracy when fine-tuning is feasible. ModAn-MulSupCon is a strong initialization for label-scarce clinical settings, whereas SimCLR/ImageNet remain preferable for frozen-encoder deployments.
Problem

Research questions and friction points this paper is trying to address.

Leveraging metadata to overcome limited expert annotations in medical imaging
Learning transferable representations using modality and anatomy information
Improving downstream accuracy through multi-label supervised contrastive pretraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging metadata as multi-hot vector targets
Using multi-label supervised contrastive loss
Pretraining with modality-anatomy aware representations
Eichi Takaya
Eichi Takaya
Tohoku University Hospital
Artificial IntelligenceMachine LearningComputer Vision
R
Ryusei Inamori
Department of Diagnostic Imaging, Tohoku University Graduate School of Medicine, Miyagi, Japan