Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model

📅 2024-10-01
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
The ophthalmology domain has long lacked open-source, domain-specific large language models (LLMs). To address this gap, we introduce LEME—the first open-source ophthalmology LLM—built upon Llama2-70B and fine-tuned on a proprietary, copyright-free ophthalmic dataset comprising 127,000 clinical cases, summaries, and educational materials. Our method employs a robust, reproducible domain-adaptation paradigm for multi-task learning, targeting clinical question answering, electronic health record (EHR) summarization, and medical examination response generation. LEME establishes a new benchmark for open-source specialty LLMs. Comprehensive evaluation demonstrates state-of-the-art performance: abstract completion achieves Rouge-L = 0.20 ± 0.03; long-context QA attains 0.19 ± 0.01; and EHR summarization scores 4.83/5. These results significantly advance both the clinical relevance and accessibility of AI in ophthalmology.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are poised to revolutionize healthcare. Ophthalmology-specific LLMs remain scarce and underexplored. We introduced an open-source, specialized LLM for ophthalmology, termed Language Enhanced Model for Eye (LEME). LEME was initially pre-trained on the Llama2 70B framework and further fine-tuned with a corpus of ~127,000 non-copyrighted training instances curated from ophthalmology-specific case reports, abstracts, and open-source study materials. We benchmarked LEME against eight other LLMs, namely, GPT-3.5, GPT-4, three Llama2 models (7B, 13B, 70B), PMC-LLAMA 13B, Meditron 70B, and EYE-Llama (another ophthalmology-specific LLM). Evaluations included four internal validation tasks: abstract completion, fill-in-the-blank, multiple-choice questions (MCQ), and short-answer QA. External validation tasks encompassed long-form QA, MCQ, patient EHR summarization, and clinical QA. Evaluation metrics included Rouge-L scores, accuracy, and expert evaluation of correctness, completeness, and readability. In internal validations, LEME consistently outperformed its counterparts, achieving Rouge-L scores of 0.20 ± 0.03 in abstract completion (all p<0.05), 0.82 ± 0.04 in fill-in-the-blank (all p<0.0001), and 0.22 ± 0.05 in short-answer QA (all p<0.0001, except versus GPT-4). In external validations, LEME excelled in long-form QA with a Rouge-L of 0.19 ± 0.01 (all p<0.0001), ranked second in MCQ accuracy (0.68 ± 0.09; all p<0.0001), and scored highest in EHR summarization and clinical QA (ranging from 4.24 to 4.83 out of 5 for correctness, completeness, and readability). LEME’s emphasis on robust fine-tuning and the use of non-copyrighted data represents a breakthrough in open-source ophthalmology-specific LLMs, offering the potential to revolutionize execution of clinical tasks while democratizing research collaboration.
Problem

Research questions and friction points this paper is trying to address.

Developing open-source specialized LLMs for ophthalmology clinical applications
Addressing scarcity of ophthalmology-specific models with advanced reasoning capabilities
Validating model performance through comprehensive clinical evaluation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned Llama2 70B with ophthalmology-specific training data
Used non-copyrighted case reports and abstracts for training
Validated through clinical tasks and expert evaluations
Aidan Gilson
Aidan Gilson
Massachusetts Eye and Ear, Harvard Medical School
OphthalmologyMachine LearningArtificial Intelligence
Xuguang Ai
Xuguang Ai
Biomedical Informatics & Data Science, Yale University
AI in HealthcareData ScienceNLPBiomedical Informatics
Qianqian Xie
Qianqian Xie
Wuhan University
NLPLLM
S
Sahana Srinivasan
Singapore Eye Research Institute, Singapore National Eye Centre, Singapore
Krithi Pushpanathan
Krithi Pushpanathan
Research Associate, National University of Singapore
ophthalmologyartificial intelligence
M
Maxwell Singer
Department of Ophthalmology and Visual Science, Yale School of Medicine, Yale University, New Haven, USA
Jimin Huang
Jimin Huang
The Fin AI
computational finance
Hyunjae Kim
Hyunjae Kim
Yale University
Natural Language ProcessingBiomedical InformaticsHealthcare
E
Erping Long
Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
P
Peixing Wan
Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
L
L. V. D. Priore
Department of Ophthalmology and Visual Science, Yale School of Medicine, Yale University, New Haven, USA
Lucila Ohno-Machado
Lucila Ohno-Machado
University of California San Diego
Biomedical InformaticsPredictive Modeling
H
Hua Xu
Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, USA
Dianbo Liu
Dianbo Liu
Assistant professor, National University of Singapore
Push the limits of humanmachine learningbiomedical sciences
R
Ron A. Adelman
Department of Ophthalmology and Visual Science, Yale School of Medicine, Yale University, New Haven, USA
Y
Yih-Chung Tham
Ophthalmology and Visual Science Academic Clinical Program, Duke-NUS Medical School, Singapore, Singapore
Qingyu Chen
Qingyu Chen
Biomedical Informatics & Data Science, Yale University; NCBI-NLM, National Institutes of Health
Text miningMachine learningData curationBioNLPMedical Imaging Analysis