Is an Ultra Large Natural Image-Based Foundation Model Superior to a Retina-Specific Model for Detecting Ocular and Systemic Diseases?

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

The clinical applicability of general-purpose versus domain-specific foundation models in ophthalmic disease detection and systemic disease prediction remains unclear. Method: We systematically evaluated DINOv2 (a general-purpose vision foundation model) and RETFound (a retinal-specialized model) across eight open-source fundus image datasets, AlzEye, and UK Biobank, using supervised fine-tuning and AUROC assessment, complemented by statistical significance testing and robustness evaluation under 10% few-shot settings. Contribution/Results: We report the first empirical evidence that DINOv2-large achieves superior performance in diabetic retinopathy detection (AUROC: 0.850–0.952), significantly outperforming RETFound; conversely, RETFound excels in predicting cardiovascular diseases—including heart failure—with higher AUROC (0.732–0.796). These findings establish a “task-oriented foundation model selection paradigm,” providing empirically grounded, methodologically rigorous guidance for selecting appropriate vision foundation models in medical imaging applications.

Technology Category

Application Category

📝 Abstract

The advent of foundation models (FMs) is transforming medical domain. In ophthalmology, RETFound, a retina-specific FM pre-trained sequentially on 1.4 million natural images and 1.6 million retinal images, has demonstrated high adaptability across clinical applications. Conversely, DINOv2, a general-purpose vision FM pre-trained on 142 million natural images, has shown promise in non-medical domains. However, its applicability to clinical tasks remains underexplored. To address this, we conducted head-to-head evaluations by fine-tuning RETFound and three DINOv2 models (large, base, small) for ocular disease detection and systemic disease prediction tasks, across eight standardized open-source ocular datasets, as well as the Moorfields AlzEye and the UK Biobank datasets. DINOv2-large model outperformed RETFound in detecting diabetic retinopathy (AUROC=0.850-0.952 vs 0.823-0.944, across three datasets, all P<=0.007) and multi-class eye diseases (AUROC=0.892 vs. 0.846, P<0.001). In glaucoma, DINOv2-base model outperformed RETFound (AUROC=0.958 vs 0.940, P<0.001). Conversely, RETFound achieved superior performance over all DINOv2 models in predicting heart failure, myocardial infarction, and ischaemic stroke (AUROC=0.732-0.796 vs 0.663-0.771, all P<0.001). These trends persisted even with 10% of the fine-tuning data. These findings showcase the distinct scenarios where general-purpose and domain-specific FMs excel, highlighting the importance of aligning FM selection with task-specific requirements to optimise clinical performance.

Problem

Research questions and friction points this paper is trying to address.

Compare general-purpose and retina-specific foundation models

Evaluate models for ocular and systemic disease detection

Determine optimal model selection for clinical tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning general-purpose vision models

Comparing retina-specific and natural image models

Evaluating models on multiple disease datasets

🔎 Similar Papers

LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models