RetinaLogos: Fine-Grained Synthesis of High-Resolution Retinal Images Through Captions

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Severe scarcity of annotated retinal imaging data hinders ophthalmic AI development, particularly limiting fine-grained, anatomy- and pathology-controllable color fundus photography (CFP) generation. Method: We propose an LLM-driven structured text generation paradigm to construct the first synthetic dataset of 1.4 million image–text pairs, enabling precise semantic control over anatomical details, disease staging, and lesion types; further, we design a three-stage diffusion model training framework to achieve fine-grained text–image alignment and medically controllable synthesis. Contribution/Results: Evaluation shows that 62.07% of synthesized images are deemed “indistinguishable from real clinical images” by ophthalmologists. The generated data improves diagnostic accuracy by 10–25% in diabetic retinopathy grading and glaucoma detection tasks, significantly advancing high-fidelity synthetic data generation and clinically interpretable AI deployment.

Technology Category

Application Category

📝 Abstract
The scarcity of high-quality, labelled retinal imaging data, which presents a significant challenge in the development of machine learning models for ophthalmology, hinders progress in the field. To synthesise Colour Fundus Photographs (CFPs), existing methods primarily relying on predefined disease labels face significant limitations. However, current methods remain limited, thus failing to generate images for broader categories with diverse and fine-grained anatomical structures. To overcome these challenges, we first introduce an innovative pipeline that creates a large-scale, synthetic Caption-CFP dataset comprising 1.4 million entries, called RetinaLogos-1400k. Specifically, RetinaLogos-1400k uses large language models (LLMs) to describe retinal conditions and key structures, such as optic disc configuration, vascular distribution, nerve fibre layers, and pathological features. Furthermore, based on this dataset, we employ a novel three-step training framework, called RetinaLogos, which enables fine-grained semantic control over retinal images and accurately captures different stages of disease progression, subtle anatomical variations, and specific lesion types. Extensive experiments demonstrate state-of-the-art performance across multiple datasets, with 62.07% of text-driven synthetic images indistinguishable from real ones by ophthalmologists. Moreover, the synthetic data improves accuracy by 10%-25% in diabetic retinopathy grading and glaucoma detection, thereby providing a scalable solution to augment ophthalmic datasets.
Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of high-quality labeled retinal imaging data
Overcoming limitations of predefined disease labels in CFP synthesis
Generating fine-grained retinal images with diverse anatomical structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic retinal images via captions
Uses LLMs for detailed retinal condition descriptions
Three-step framework for fine-grained semantic control
🔎 Similar Papers
No similar papers found.
J
Junzhi Ning
Shanghai AI Lab, China
C
Cheng Tang
Shanghai AI Lab, China; Shanghai Institute of Laser Technology, Shanghai 200233, China
K
Kaijin Zhou
Eye Hospital, Wenzhou Medical University, China
D
Diping Song
Shanghai AI Lab, China
Lihao Liu
Lihao Liu
Amazon
LLM-based AgentHealthcare AI
M
Ming Hu
Monash University, Australia
W
Wei Li
Shanghai Jiao Tong University, China
Yanzhou Su
Yanzhou Su
FZU, UESTC
medical image analysis
T
Tianbing Li
Shanghai AI Lab, China
J
Jiyao Liu
Fudan University, China
Y
Yejin
Monash University, Australia
S
Sheng Zhang
Imperial College London, United Kingdom
Yuanfeng Ji
Yuanfeng Ji
Stanford; HKU
Computer visionMedical Image Analysis
Junjun He
Junjun He
Shanghai Jiao Tong University