VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge

📅 2024-08-05
🏛️ arXiv.org
📈 Citations: 7
Influential: 0
📄 PDF
🤖 AI Summary
In resource-constrained settings, ophthalmic diagnosis is impeded by shortages of specialists and imaging equipment. To address this, we propose VisionUnite—the first vision-language foundation model tailored for ophthalmology. It integrates simulated doctor-patient dialogues and clinical reasoning into a multimodal pretraining framework using 1.24 million image-text pairs, followed by fine-tuning on MMFundus (296K fundus images and 889K clinical dialogue utterances). Architecturally, VisionUnite couples a Vision Transformer (ViT) with a large language model (LLM), augmented with domain-specific knowledge injection. Experiments demonstrate that VisionUnite achieves diagnostic accuracy comparable to junior ophthalmologists and significantly outperforms GPT-4V and Gemini Pro. It supports open-domain multi-disease screening, interpretable clinical decision-making, and patient education—effectively bridging generative AI with real-world clinical workflows.

Technology Category

Application Category

📝 Abstract
The need for improved diagnostic methods in ophthalmology is acute, especially in the less developed regions with limited access to specialists and advanced equipment. Therefore, we introduce VisionUnite, a novel vision-language foundation model for ophthalmology enhanced with clinical knowledge. VisionUnite has been pretrained on an extensive dataset comprising 1.24 million image-text pairs, and further refined using our proposed MMFundus dataset, which includes 296,379 high-quality fundus image-text pairs and 889,137 simulated doctor-patient dialogue instances. Our experiments indicate that VisionUnite outperforms existing generative foundation models such as GPT-4V and Gemini Pro. It also demonstrates diagnostic capabilities comparable to junior ophthalmologists. VisionUnite performs well in various clinical scenarios including open-ended multi-disease diagnosis, clinical explanation, and patient interaction, making it a highly versatile tool for initial ophthalmic disease screening. VisionUnite can also serve as an educational aid for junior ophthalmologists, accelerating their acquisition of knowledge regarding both common and rare ophthalmic conditions. VisionUnite represents a significant advancement in ophthalmology, with broad implications for diagnostics, medical education, and understanding of disease mechanisms.
Problem

Research questions and friction points this paper is trying to address.

Improving ophthalmology diagnostics in resource-limited regions
Enhancing vision-language models with clinical knowledge
Supporting multi-disease diagnosis and medical education
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language model enhanced with clinical knowledge
Pretrained on 1.24 million image-text pairs
Outperforms GPT-4V and Gemini Pro
🔎 Similar Papers
No similar papers found.
Zihan Li
Zihan Li
University of Washington
Foundation ModelAI for HealthcareMultimodal Learning
D
Diping Song
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China; Department of Bioengineering, University of Washington, Seattle, 98195, USA
Z
Zefeng Yang
State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
Deming Wang
Deming Wang
State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
F
Fei Li
State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
X
Xiulan Zhang
State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University, Guangdong Provincial Key Laboratory of Ophthalmology and Visual Science, Guangdong Provincial Clinical Research Center for Ocular Diseases, Guangzhou 510060, China
P
P. E. Kinahan
Department of Bioengineering, University of Washington, Seattle, 98195, USA; Department of Radiology, University of Washington, Seattle, 98195, USA
Y
Yu Qiao
Shanghai Artificial Intelligence Laboratory, Shanghai, 200232, China