🤖 AI Summary
To address three critical bottlenecks in intelligent ophthalmic diagnosis—scarcity of high-quality multimodal data, absence of systematic evaluation benchmarks, and difficulty in fine-grained lesion identification—this work introduces Eyecare-100K (the first large-scale ophthalmic visual instruction dataset), Eyecare-Bench (a comprehensive benchmark spanning image understanding, lesion localization, and clinical reasoning), and EyecareGPT (a vision-language model enabling region-level fundus lesion interpretation). Methodologically, we propose an ophthalmology-specific visual instruction data engine, an adaptive resolution scaling mechanism, and inter-layer dense connectors, integrating multi-agent synthetic data generation, vision-language alignment fine-tuning, and multi-scale localization supervision. Our approach achieves state-of-the-art performance across multiple tasks, significantly improving lesion localization accuracy and clinical reasoning consistency. All code and datasets are publicly released.
📝 Abstract
Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare, but their reliance on general medical data and coarse-grained global visual understanding limits them in intelligent ophthalmic diagnosis. Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data. The lack of deeply annotated, high-quality, multi-modal ophthalmic visual instruction data; (ii) Benchmark. The absence of a comprehensive and systematic benchmark for evaluating diagnostic performance; (iii) Model. The difficulty of adapting holistic visual architectures to fine-grained, region-specific ophthalmic lesion identification. In this paper, we propose the Eyecare Kit, which systematically tackles the aforementioned three key challenges with the tailored dataset, benchmark and model: First, we construct a multi-agent data engine with real-life ophthalmology data to produce Eyecare-100K, a high-quality ophthalmic visual instruction dataset. Subsequently, we design Eyecare-Bench, a benchmark that comprehensively evaluates the overall performance of LVLMs on intelligent ophthalmic diagnosis tasks across multiple dimensions. Finally, we develop the EyecareGPT, optimized for fine-grained ophthalmic visual understanding thoroughly, which incorporates an adaptive resolution mechanism and a layer-wise dense connector. Extensive experimental results indicate that the EyecareGPT achieves state-of-the-art performance in a range of ophthalmic tasks, underscoring its significant potential for the advancement of open research in intelligent ophthalmic diagnosis. Our project is available at https://github.com/DCDmllm/EyecareGPT.