Benchmarking Ophthalmology Foundation Models for Clinically Significant Age Macular Degeneration Detection

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study investigates the generalization capability of Vision Transformers (ViTs) for diagnosing intermediate-to-advanced age-related macular degeneration (AMD) from fundus images, addressing the central question: *Does ophthalmology-specific pretraining outperform generic natural-image pretraining?* We systematically evaluate six self-supervised ViT models across seven multi-center, expert-annotated fundus datasets (70,000 images). Contrary to prevailing assumptions, we find that the natural-image pre-trained iBOT ViT achieves superior cross-distribution AMD diagnosis performance (AUROC: 0.80–0.97), significantly surpassing ophthalmology-specific pretraining (0.78–0.96) and no-pretraining baselines (0.68–0.91). Our contributions include: (1) demonstrating that general-purpose vision foundation models exhibit strong clinical diagnostic generalizability in ophthalmology; (2) releasing BRAMD—the first Brazilian multi-center AMD dataset (n=587); and (3) establishing a standardized ViT evaluation benchmark for cross-center generalization in retinal disease analysis.

Technology Category

Application Category

📝 Abstract

Self-supervised learning (SSL) has enabled Vision Transformers (ViTs) to learn robust representations from large-scale natural image datasets, enhancing their generalization across domains. In retinal imaging, foundation models pretrained on either natural or ophthalmic data have shown promise, but the benefits of in-domain pretraining remain uncertain. To investigate this, we benchmark six SSL-pretrained ViTs on seven digital fundus image (DFI) datasets totaling 70,000 expert-annotated images for the task of moderate-to-late age-related macular degeneration (AMD) identification. Our results show that iBOT pretrained on natural images achieves the highest out-of-distribution generalization, with AUROCs of 0.80-0.97, outperforming domain-specific models, which achieved AUROCs of 0.78-0.96 and a baseline ViT-L with no pretraining, which achieved AUROCs of 0.68-0.91. These findings highlight the value of foundation models in improving AMD identification and challenge the assumption that in-domain pretraining is necessary. Furthermore, we release BRAMD, an open-access dataset (n=587) of DFIs with AMD labels from Brazil.

Problem

Research questions and friction points this paper is trying to address.

Evaluating ViTs for detecting age-related macular degeneration

Comparing natural vs ophthalmic pretraining for AMD identification

Assessing generalization of foundation models on DFI datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning enhances ViTs for AMD detection

iBOT on natural images outperforms domain-specific models

Open-access BRAMD dataset supports AMD research

🔎 Similar Papers

LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models