🤖 AI Summary
This work addresses the limited transferability of pretrained Vision Transformers (ViTs) in cardiac magnetic resonance (MR) sequence classification due to a lack of domain-specific knowledge. To overcome this, the authors propose a self-supervised contrastive learning adaptation method tailored for cardiac MR images, which efficiently adapts ViT models without requiring labeled data. By integrating large-batch training with domain-customized data augmentation, the approach substantially enhances model generalization. Evaluated on four common cardiac MR sequence classification tasks, the method achieves AUC scores exceeding 0.75 across all tasks, outperforming conventional supervised training strategies. Furthermore, the adapted model demonstrates strong cross-domain generalization, successfully transferring to external medical imaging datasets such as BraTS and ADNI.
📝 Abstract
Vision Transformer (ViT) models, utilizing self-attention mechanisms, have demonstrated robust generalization capabilities across various vision tasks, including image classification. However, these models, typically pretrained on general public datasets, often lack the specialized domain knowledge necessary for medical imaging applications. In this study, we investigate the adaptation of ViT models, specifically for cardiac magnetic resonance (MR) images, using an in-house dataset. We found that pretrained ViT features do not effectively transfer to the cardiac MR domain. To overcome this limitation, we introduce an adaptation strategy that utilizes image-based self-supervised contrastive learning, demonstrating superior performance compared to traditional supervised training approaches. Moreover, our adapted ViT model exhibits strong generalization to external MR datasets such as BraTS and ADNI. Through ablation studies, we further investigate the impact of batch size and dataset scale on performance. Ultimately, our adapted model achieves classification AUC exceeding 0.75 across the four most common cardiac MR sequences.