π€ AI Summary
This study investigates the generalization capability of Vision Transformers (ViTs) for diagnosing intermediate-to-advanced age-related macular degeneration (AMD) from fundus images, addressing the central question: *Does ophthalmology-specific pretraining outperform generic natural-image pretraining?* We systematically evaluate six self-supervised ViT models across seven multi-center, expert-annotated fundus datasets (70,000 images). Contrary to prevailing assumptions, we find that the natural-image pre-trained iBOT ViT achieves superior cross-distribution AMD diagnosis performance (AUROC: 0.80β0.97), significantly surpassing ophthalmology-specific pretraining (0.78β0.96) and no-pretraining baselines (0.68β0.91). Our contributions include: (1) demonstrating that general-purpose vision foundation models exhibit strong clinical diagnostic generalizability in ophthalmology; (2) releasing BRAMDβthe first Brazilian multi-center AMD dataset (n=587); and (3) establishing a standardized ViT evaluation benchmark for cross-center generalization in retinal disease analysis.
π Abstract
Self-supervised learning (SSL) has enabled Vision Transformers (ViTs) to learn robust representations from large-scale natural image datasets, enhancing their generalization across domains. In retinal imaging, foundation models pretrained on either natural or ophthalmic data have shown promise, but the benefits of in-domain pretraining remain uncertain. To investigate this, we benchmark six SSL-pretrained ViTs on seven digital fundus image (DFI) datasets totaling 70,000 expert-annotated images for the task of moderate-to-late age-related macular degeneration (AMD) identification. Our results show that iBOT pretrained on natural images achieves the highest out-of-distribution generalization, with AUROCs of 0.80-0.97, outperforming domain-specific models, which achieved AUROCs of 0.78-0.96 and a baseline ViT-L with no pretraining, which achieved AUROCs of 0.68-0.91. These findings highlight the value of foundation models in improving AMD identification and challenge the assumption that in-domain pretraining is necessary. Furthermore, we release BRAMD, an open-access dataset (n=587) of DFIs with AMD labels from Brazil.