Glaucoma Detection and Structured OCT Report Generation via a Fine-tuned Multimodal Large Language Model

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study introduces the first interpretable multimodal large language model (MLLM) for glaucoma screening, designed to jointly perform optical coherence tomography (OCT) optic nerve head circular scan image quality assessment and structured clinical report generation. Methodologically, we fine-tune Llama 3.2 Vision-Instruct in an end-to-end manner using paired OCT images and synthetically generated structured reports—including glaucoma diagnosis and quantitative retinal nerve fiber layer (RNFL) thinning analysis across 12 sectors—and evaluate performance using accuracy, F1-score, BLEU, ROUGE, and BERTScore. Our key contribution is the first application of MLLMs to jointly model OCT quality triage and anatomically partitioned quantitative reporting, enabling simultaneous disease classification and sector-level localization. Experiments demonstrate strong performance: image quality classification accuracy of 0.90 (specificity = 0.98), glaucoma detection accuracy of 0.86 (sensitivity = 0.91, F1 = 0.91), and RNFL thinning prediction accuracy per sector ranging from 0.83 to 0.94; generated textual reports exhibit high semantic fidelity to expert annotations.

Technology Category

Application Category

📝 Abstract
Objective: To develop an explainable multimodal large language model (MM-LLM) that (1) screens optic nerve head (ONH) OCT circle scans for quality and (2) generates structured clinical reports that include glaucoma diagnosis and sector-wise retinal nerve fiber layer (RNFL) thinning assessments. Design: Retrospective cohort study of 1,310 subjects contributing 43,849 Spectralis ONH OCT circle scans (1,331 glaucomatous and 867 healthy eyes) from the DIGS and ADAGES cohorts. Methods: A MM-LLM (Llama 3.2 Vision-Instruct model) was fine-tuned to generate clinical descriptions of OCT imaging data. Training data included paired OCT images and automatically generated, structured clinical reports that described global and sectoral RNFL thinning. Poor-quality scans were labeled as unusable and paired with a fixed refusal statement. The model was evaluated on a held-out test set for three tasks: quality assessment, glaucoma detection, and RNFL thinning classification across seven anatomical sectors. Evaluation metrics included accuracy, sensitivity, specificity, precision, and F1-score. Model description quality was also evaluated using standard text evaluation metrics. Results: The model achieved 0.90 accuracy and 0.98 specificity for quality triage. For glaucoma detection, accuracy was 0.86 (sensitivity 0.91, specificity 0.73, F1-score 0.91). RNFL thinning prediction accuracy ranged from 0.83 to 0.94, with highest performance in global and temporal sectors. Text generation scores showed strong alignment with reference reports (BLEU: 0.82; ROUGE-1: 0.94; ROUGE-2: 0.87; ROUGE-L: 0.92; BERTScore-F1: 0.99). Conclusions: The fine-tuned MM-LLM generated accurate clinical descriptions based on OCT imaging. The model achieved high accuracy in identifying image quality issues and detecting glaucoma. The model also provided sectoral descriptions of RNFL thinning to help support clinical OCT evaluation.
Problem

Research questions and friction points this paper is trying to address.

Automating glaucoma detection from optic nerve OCT scans
Generating structured clinical reports for retinal nerve fiber thinning
Assessing OCT scan quality through multimodal AI analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned multimodal LLM for OCT image analysis
Automated structured report generation for glaucoma diagnosis
Quality triage and sectoral RNFL thinning assessment
🔎 Similar Papers
No similar papers found.
J
Jalil Jalili
Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California San Diego, La Jolla, CA, USA.
Y
Yashraj Gavhane
Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
E
Evan Walker
Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California San Diego, La Jolla, CA, USA.
Anna Heinke
Anna Heinke
University of California San Diego
C
Christopher Bowd
Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California San Diego, La Jolla, CA, USA.
A
Akram Belghith
Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California San Diego, La Jolla, CA, USA.
M
Massimo A. Fazio
Department of Ophthalmology and Vision Sciences, University of Alabama at Birmingham, Birmingham, AL
C
Christopher A. Girkin
Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California San Diego, La Jolla, CA, USA.
C. Gustavo De Moraes
C. Gustavo De Moraes
Columbia University Medical Center
MedicineOphthalmologyGlaucoma
J
Jeffrey M. Liebmann
Department of Ophthalmology, Harkness Eye Institute, Bernard and Shirlee Brown Glaucoma Research Laboratory, New York, NY, United States.
Sally L. Baxter
Sally L. Baxter
Associate Professor, University of California San Diego
OphthalmologyBiomedical Informatics
R
Robert N. Weinreb
Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California San Diego, La Jolla, CA, USA.
L
Linda M. Zangwill
Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California San Diego, La Jolla, CA, USA.
M
Mark Christopher
Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology, Shiley Eye Institute, University of California San Diego, La Jolla, CA, USA.