🤖 AI Summary
Poor generalizability and degradation of representation quality under strong augmentations hinder contrastive learning in medical imaging. This work first uncovers the feature collapse mechanism induced by strong augmentations in retinal image contrastive learning, establishing the universal principle that “weaker augmentations yield superior representations.” We propose a gradient-tunable, lightweight weak-augmentation strategy tailored to medical imaging characteristics—including micro-cropping and low-intensity color jitter—and integrate it into the SimCLR framework. Pretraining and cross-dataset transfer evaluation are conducted across six multi-center retinal datasets, including MESSIDOR2. Results show consistent and statistically significant improvements: on MESSIDOR2, AUROC increases by 0.010 (0.838 → 0.848) and AUPR rises markedly by 0.074 (0.523 → 0.597); all five additional clinical datasets exhibit robust, statistically significant gains. These findings validate the method’s stability and clinical generalizability.
📝 Abstract
Contrastive learning, a prominent approach within self-supervised learning, has demonstrated significant effectiveness in developing generalizable models for various applications involving natural images. However, recent research indicates that these successes do not necessarily extend to the medical imaging domain. In this paper, we investigate the reasons for this suboptimal performance and hypothesize that the dense distribution of medical images poses challenges to the pretext tasks in contrastive learning, particularly in constructing positive and negative pairs. We explore model performance under different augmentation strategies and compare the results to those achieved with strong augmentations. Our study includes six publicly available datasets covering multiple clinically relevant tasks. We further assess the model's generalizability through external evaluations. The model pre-trained with weak augmentation outperforms those with strong augmentation, improving AUROC from 0.838 to 0.848 and AUPR from 0.523 to 0.597 on MESSIDOR2, and showing similar enhancements across other datasets. Our findings suggest that optimizing the scale of augmentation is critical for enhancing the efficacy of contrastive learning in medical imaging.