EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model

📅 2025-04-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address three critical bottlenecks in intelligent ophthalmic diagnosis—scarcity of high-quality multimodal data, absence of systematic evaluation benchmarks, and difficulty in fine-grained lesion identification—this work introduces Eyecare-100K (the first large-scale ophthalmic visual instruction dataset), Eyecare-Bench (a comprehensive benchmark spanning image understanding, lesion localization, and clinical reasoning), and EyecareGPT (a vision-language model enabling region-level fundus lesion interpretation). Methodologically, we propose an ophthalmology-specific visual instruction data engine, an adaptive resolution scaling mechanism, and inter-layer dense connectors, integrating multi-agent synthetic data generation, vision-language alignment fine-tuning, and multi-scale localization supervision. Our approach achieves state-of-the-art performance across multiple tasks, significantly improving lesion localization accuracy and clinical reasoning consistency. All code and datasets are publicly released.

Technology Category

Application Category

📝 Abstract
Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare, but their reliance on general medical data and coarse-grained global visual understanding limits them in intelligent ophthalmic diagnosis. Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data. The lack of deeply annotated, high-quality, multi-modal ophthalmic visual instruction data; (ii) Benchmark. The absence of a comprehensive and systematic benchmark for evaluating diagnostic performance; (iii) Model. The difficulty of adapting holistic visual architectures to fine-grained, region-specific ophthalmic lesion identification. In this paper, we propose the Eyecare Kit, which systematically tackles the aforementioned three key challenges with the tailored dataset, benchmark and model: First, we construct a multi-agent data engine with real-life ophthalmology data to produce Eyecare-100K, a high-quality ophthalmic visual instruction dataset. Subsequently, we design Eyecare-Bench, a benchmark that comprehensively evaluates the overall performance of LVLMs on intelligent ophthalmic diagnosis tasks across multiple dimensions. Finally, we develop the EyecareGPT, optimized for fine-grained ophthalmic visual understanding thoroughly, which incorporates an adaptive resolution mechanism and a layer-wise dense connector. Extensive experimental results indicate that the EyecareGPT achieves state-of-the-art performance in a range of ophthalmic tasks, underscoring its significant potential for the advancement of open research in intelligent ophthalmic diagnosis. Our project is available at https://github.com/DCDmllm/EyecareGPT.
Problem

Research questions and friction points this paper is trying to address.

Lack of high-quality annotated ophthalmic visual data
Absence of comprehensive benchmark for ophthalmic diagnosis
Difficulty in adapting models for fine-grained lesion identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructs high-quality ophthalmic dataset Eyecare-100K
Develops comprehensive benchmark Eyecare-Bench
Optimizes model EyecareGPT with adaptive resolution
🔎 Similar Papers
No similar papers found.
Sijing Li
Sijing Li
zhejiang university
MLLM
Tianwei Lin
Tianwei Lin
Zhejiang University
MLLMs
Lingshuai Lin
Lingshuai Lin
Harbin Institute of Technology
diffusion modelMLLM
W
Wenqiao Zhang
Zhejiang University
J
Jiang Liu
Zhejiang University
X
Xiaoda Yang
Zhejiang University
Juncheng Li
Juncheng Li
East China Normal University
Super ResolutionImage RestorationComputer VisionMedical Image Analysis
Y
Yucheng He
The First People’s Hospital of Chenzhou
X
Xiaohui Song
Zhejiang University
J
Jun Xiao
Zhejiang University
Y
Yueting Zhuang
Zhejiang University
B
Beng Chin Ooi
National University of Singapore