VOLMO: Versatile and Open Large Models for Ophthalmology

πŸ“… 2026-03-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing general-purpose and medical multimodal large language models exhibit limited performance on ophthalmology-specific tasks and lack open-source, domain-specialized solutions. To address this gap, this work proposes VOLMO-2Bβ€”an architecture-agnostic, open-data multimodal large language model framework tailored for ophthalmology. It introduces a novel three-stage training strategy: knowledge-rich pretraining on large-scale medical image–text pairs from literature, multitask fine-tuning on multi-disease annotated data, and clinical chain-of-thought refinement using real-world case reports. VOLMO-2B achieves an average F1 score of 87.4% across 12 eye diseases, significantly outperforms strong baselines in image captioning and clinical recommendation generation, and demonstrates robust generalization on three independent external cohorts for age-related macular degeneration and diabetic retinopathy.

Technology Category

Application Category

πŸ“ Abstract
Vision impairment affects millions globally, and early detection is critical to preventing irreversible vision loss. Ophthalmology workflows require clinicians to integrate medical images, structured clinical data, and free-text notes to determine disease severity and management, which is time-consuming and burdensome. Recent multimodal large language models (MLLMs) show promise, but existing general and medical MLLMs perform poorly in ophthalmology, and few ophthalmology-specific MLLMs are openly available. We present VOLMO (Versatile and Open Large Models for Ophthalmology), a model-agnostic, data-open framework for developing ophthalmology-specific MLLMs. VOLMO includes three stages: ophthalmology knowledge pretraining on 86,965 image-text pairs from 26,569 articles across 82 journals; domain task fine-tuning on 26,929 annotated instances spanning 12 eye conditions for disease screening and severity classification; and multi-step clinical reasoning on 913 patient case reports for assessment, planning, and follow-up care. Using this framework, we trained a compact 2B-parameter MLLM and compared it with strong baselines, including InternVL-2B, LLaVA-Med-7B, MedGemma-4B, MedGemma-27B, and RETFound. We evaluated these models on image description generation, disease screening and staging classification, and assessment-and-management generation, with additional manual review by two healthcare professionals and external validation on three independent cohorts for age-related macular degeneration and diabetic retinopathy. Across settings, VOLMO-2B consistently outperformed baselines, achieving stronger image description performance, an average F1 of 87.4% across 12 eye conditions, and higher scores in external validation.
Problem

Research questions and friction points this paper is trying to address.

ophthalmology
multimodal large language models
early detection
vision impairment
clinical workflow
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal large language model
ophthalmology-specific AI
open framework
clinical reasoning
domain pretraining
πŸ”Ž Similar Papers
No similar papers found.
Z
Zhenyue Qin
Department of Biomedical Informatics & Data Science, Yale University
Younjoon Chung
Younjoon Chung
Yale University
machine learningdeep learningmedical image processing
E
Elijah Lee
Department of Biomedical Informatics & Data Science, Yale University
W
Wanyue Feng
Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University
Xuguang Ai
Xuguang Ai
Biomedical Informatics & Data Science, Yale University
AI in HealthcareData ScienceNLPBiomedical Informatics
S
Serina Applebaum
Department of Biomedical Informatics & Data Science, Yale University
M
Minjie Zou
Yong Loo Lin School of Medicine, National University of Singapore
Y
Yang Liu
Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University
P
Pan Xiao
Department of Radiology, Washington University in Saint Louis
M
Mac Singer
Department of Biomedical Informatics & Data Science, Yale University
A
Amisha Dave
Department of Biomedical Informatics & Data Science, Yale University
Aidan Gilson
Aidan Gilson
Massachusetts Eye and Ear, Harvard Medical School
OphthalmologyMachine LearningArtificial Intelligence
Tiarnan D. L. Keenan
Tiarnan D. L. Keenan
Staff Clinician, National Eye Institute, National Institutes of Health
Ophthalmology
E
Emily Y. Chew
National Eye Institute, National Institutes of Health
Zhiyong Lu
Zhiyong Lu
Senior Investigator, NLM; Adjunct Professor of CS, UIUC
BioNLPBiomedical InformaticsMedical AIArtificial Intelligence
Y
Yih-Chung Tham
Yong Loo Lin School of Medicine, National University of Singapore
R
Ron Adelman
Department of Biomedical Informatics & Data Science, Yale University
L
Luciano V. Del Priore
Department of Biomedical Informatics & Data Science, Yale University
Qingyu Chen
Qingyu Chen
Biomedical Informatics & Data Science, Yale University; NCBI-NLM, National Institutes of Health
Text miningMachine learningData curationBioNLPMedical Imaging Analysis