ModAn-MulSupCon: Modality-and Anatomy-Aware Multi-Label Supervised Contrastive Pretraining for Medical Imaging

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Medical imaging supervised pretraining is hindered by the scarcity of expert annotations, while readily available metadata—such as imaging modality and anatomical region—remain underutilized. To address this, we propose ModAn-MulSupCon, the first method to encode modality and anatomical information as multi-hot vectors and introduce a Jaccard-weighted multi-label contrastive loss for metadata-driven self-supervised representation learning. We pretrain a ResNet-18 backbone on the miniRIN dataset and evaluate transfer performance via linear probing and fine-tuning. On ACL tear detection, ModAn-MulSupCon achieves an AUC of 0.964; on thyroid nodule malignancy classification, it attains an AUC of 0.763—both significantly outperforming established baselines. These results demonstrate superior cross-task transferability and generalization under extreme label scarcity, validating the efficacy of leveraging structured metadata for self-supervised medical image representation learning.

Technology Category

Application Category

📝 Abstract

Background and objective: Expert annotations limit large-scale supervised pretraining in medical imaging, while ubiquitous metadata (modality, anatomical region) remain underused. We introduce ModAn-MulSupCon, a modality- and anatomy-aware multi-label supervised contrastive pretraining method that leverages such metadata to learn transferable representations. Method: Each image's modality and anatomy are encoded as a multi-hot vector. A ResNet-18 encoder is pretrained on a mini subset of RadImageNet (miniRIN, 16,222 images) with a Jaccard-weighted multi-label supervised contrastive loss, and then evaluated by fine-tuning and linear probing on three binary classification tasks--ACL tear (knee MRI), lesion malignancy (breast ultrasound), and nodule malignancy (thyroid ultrasound). Result: With fine-tuning, ModAn-MulSupCon achieved the best AUC on MRNet-ACL (0.964) and Thyroid (0.763), surpassing all baselines ($p<0.05$), and ranked second on Breast (0.926) behind SimCLR (0.940; not significant). With the encoder frozen, SimCLR/ImageNet were superior, indicating that ModAn-MulSupCon representations benefit most from task adaptation rather than linear separability. Conclusion: Encoding readily available modality/anatomy metadata as multi-label targets provides a practical, scalable pretraining signal that improves downstream accuracy when fine-tuning is feasible. ModAn-MulSupCon is a strong initialization for label-scarce clinical settings, whereas SimCLR/ImageNet remain preferable for frozen-encoder deployments.

Problem

Research questions and friction points this paper is trying to address.

Leveraging metadata to overcome limited expert annotations in medical imaging

Learning transferable representations using modality and anatomy information

Improving downstream accuracy through multi-label supervised contrastive pretraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging metadata as multi-hot vector targets

Using multi-label supervised contrastive loss

Pretraining with modality-anatomy aware representations

🔎 Similar Papers

RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training