Generalist versus Specialist Vision Foundation Models for Ocular Disease and Oculomics

📅 2025-09-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the transfer performance disparity between general-purpose vision foundation models (DINOv2/DINOv3) and retinal domain-specific models (RETFound-MAE/RETFound-DINOv2) on ophthalmic disease detection and ocular omics tasks. Systematically comparing fine-tuning and linear probing as transfer strategies, we find that although general models improve with scale, RETFound-DINOv2 consistently outperforms them across most tasks—especially under low-data regimes—demonstrating superior data efficiency, adaptation efficiency, and cross-task generalization. Our key contribution is the first quantitative demonstration of persistent gains from retinal-domain pretraining, establishing the irreplaceability of domain-specialized representations in medical imaging downstream tasks and providing empirical support for the design paradigm of clinical foundation models.

Technology Category

Application Category

📝 Abstract
Medical foundation models, pre-trained with large-scale clinical data, demonstrate strong performance in diverse clinically relevant applications. RETFound, trained on nearly one million retinal images, exemplifies this approach in applications with retinal images. However, the emergence of increasingly powerful and multifold larger generalist foundation models such as DINOv2 and DINOv3 raises the question of whether domain-specific pre-training remains essential, and if so, what gap persists. To investigate this, we systematically evaluated the adaptability of DINOv2 and DINOv3 in retinal image applications, compared to two specialist RETFound models, RETFound-MAE and RETFound-DINOv2. We assessed performance on ocular disease detection and systemic disease prediction using two adaptation strategies: fine-tuning and linear probing. Data efficiency and adaptation efficiency were further analysed to characterise trade-offs between predictive performance and computational cost. Our results show that although scaling generalist models yields strong adaptability across diverse tasks, RETFound-DINOv2 consistently outperforms these generalist foundation models in ocular-disease detection and oculomics tasks, demonstrating stronger generalisability and data efficiency. These findings suggest that specialist retinal foundation models remain the most effective choice for clinical applications, while the narrowing gap with generalist foundation models suggests that continued data and model scaling can deliver domain-relevant gains and position them as strong foundations for future medical foundation models.
Problem

Research questions and friction points this paper is trying to address.

Evaluating adaptability of generalist vs specialist vision models
Assessing performance in ocular disease detection tasks
Comparing data efficiency and computational cost trade-offs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Specialist RETFound-DINOv2 outperforms generalist models
Systematic evaluation of adaptation strategies fine-tuning
Domain-specific pre-training maintains clinical effectiveness superiority
🔎 Similar Papers
No similar papers found.
Y
Yukun Zhou
Institute of Ophthalmology, University College London; Hawkes Institute, University College London; NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust
P
Paul Nderitu
Institute of Ophthalmology, University College London; NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust
J
Jocelyn Hui Lin Goh
Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore; Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore; Singapore Eye Research Institute, Singapore National Eye Centre
J
Justin Engelmann
Institute of Ophthalmology, University College London; NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust
S
Siegfried K. Wagner
Institute of Ophthalmology, University College London; NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust
A
Anran Ran
Department of Ophthalmology and Visual Sciences, Chinese University of Hong Kong
H
Hongyang Jiang
Department of Ophthalmology and Visual Sciences, Chinese University of Hong Kong
Lie Ju
Lie Ju
University College London; Moorfields Eye Hospital; Monash University
Computer VisionMedical Image AnalysisOphthalmology
Ke Zou
Ke Zou
Apple, Inc
Power electronicsSwitched-capacitor ConverterPower Semiconductor Devices
S
Sahana Srinivasan
Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore; Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore
Hyunmin Kim
Hyunmin Kim
Institute of Ophthalmology, University College London; NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust
T
Takahiro Ninomiya
Institute of Ophthalmology, University College London; NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust
Zheyuan Wang
Zheyuan Wang
PhD Candidate, ECE, Georgia Institute of Technology
RoboticsReinforcement LearningGraph Neural NetworksComputer VisionBio-signal Processing
G
Gabriel Dawei Yang
Singapore Eye Research Institute, Singapore National Eye Centre; Department of Ophthalmology and Visual Sciences, Chinese University of Hong Kong
E
Eden Ruffell
Institute of Ophthalmology, University College London; NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust
D
Dominic Williamson
Institute of Ophthalmology, University College London; NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust
Rui Santos
Rui Santos
Department of Ophthalmology, Stadtspital Zürich; Spross Research Institute
G
Gabor Mark Somfai
Department of Ophthalmology, Stadtspital Zürich; Spross Research Institute
C
Carol Y. Cheung
Department of Ophthalmology and Visual Sciences, Chinese University of Hong Kong
Tien Yin Wong
Tien Yin Wong
Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore; Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore; School of Clinical Medicine, Tsinghua Medicine, Tsinghua University; Beijing Visual Science and Translational Eye Research Institute, Beijing Tsinghua Changgung Hospital
Daniel C. Alexander
Daniel C. Alexander
Professor of Imaging Science, Centre for Medical Image Computing, Department of Computer Science
Computer scienceMachine learningMedical imagingdiffusion MRINeuroscience
Yih Chung Tham
Yih Chung Tham
Yong Loo Lin School of Medicine, National University of Singapore; Singapore Eye Research Institute
OphthalmologyEpidemiologyVisual ImpairmentDeep Learning
P
Pearse A. Keane
Institute of Ophthalmology, University College London; NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust