🤖 AI Summary
This work proposes OcularChat, a novel interactive diagnostic framework for age-related macular degeneration (AMD) that integrates multimodal large language models with clinical reasoning and explainability—capabilities largely absent in existing static retinal disease diagnosis models. By fine-tuning Qwen2.5-VL on a dataset comprising 705,850 simulated physician–patient dialogues and 46,167 fundus images, OcularChat leverages visual question answering and multitask classification to deliver interpretable, interactive diagnostic support. Evaluated on the AREDS/AREDS2 datasets, the model achieves accuracies of 0.954, 0.849, and 0.678 across three AMD diagnostic tasks, significantly outperforming current state-of-the-art methods and demonstrating superior performance in subjective assessments by ophthalmologists.
📝 Abstract
Despite strong performance of deep learning models in retinal disease detection, most systems produce static predictions without clinical reasoning or interactive explanation. Recent advances in multimodal large language models (MLLMs) integrate diagnostic predictions with clinically meaningful dialogue to support clinical decision-making and patient counseling. In this study, OcularChat, an MLLM, was fine-tuned from Qwen2.5-VL using simulated patient-physician dialogues to diagnose age-related macular degeneration (AMD) through visual question answering on color fundus photographs (CFPs). A total of 705,850 simulated dialogues paired with 46,167 CFPs were generated to train OcularChat to identify key AMD features and produce reasoned predictions. OcularChat demonstrated strong classification performance in AREDS, achieving accuracies of 0.954, 0.849, and 0.678 for the three diagnostic tasks: advanced AMD, pigmentary abnormalities, and drusen size, significantly outperforming existing MLLMs. On AREDS2, OcularChat remained the top-performing method on all tasks. Across three independent ophthalmologist graders, OcularChat achieved higher mean scores than a strong baseline model for advanced AMD (3.503 vs. 2.833), pigmentary abnormalities (3.272 vs. 2.828), drusen size (3.064 vs. 2.433), and overall impression (2.978 vs. 2.464) on a 5-point clinical grading rubric. Beyond strong objective performance in AMD severity classification, OcularChat demonstrated the ability to provide diagnostic reasoning, clinically relevant explanations, and interactive dialogue, with high performance in subjective ophthalmologist evaluation. These findings suggest that MLLMs may enable accurate, interpretable, and clinically useful image-based diagnosis and classification of AMD.