🤖 AI Summary
To address the limited deployment of optical coherence tomography (OCT) in resource-constrained settings—due to its high cost and requirement for specialized operation—this study proposes a novel paradigm for synthesizing high-fidelity 3D OCT volumes from low-cost 2D fundus color photographs. We introduce the first benchmark for cross-modal fundus-to-3D-OCT synthesis and establish a two-tiered evaluation framework: pixel-level B-scan similarity and semantic-level volumetric consistency, rigorously validating clinical feasibility. Methodologically, our approach integrates cross-modal collaborative preprocessing, external fundus image pretraining, vision foundation model incorporation, and a customized 3D generative network. In an international challenge involving 342 teams, the top-performing method significantly improved anatomical fidelity and volumetric consistency. This work provides a practical, deployable solution for tele-ophthalmology and AI-assisted diagnosis in underserved regions.
📝 Abstract
Optical Coherence Tomography (OCT) provides high-resolution, 3D, and non-invasive visualization of retinal layers in vivo, serving as a critical tool for lesion localization and disease diagnosis. However, its widespread adoption is limited by equipment costs and the need for specialized operators. In comparison, 2D color fundus photography offers faster acquisition and greater accessibility with less dependence on expensive devices. Although generative artificial intelligence has demonstrated promising results in medical image synthesis, translating 2D fundus images into 3D OCT images presents unique challenges due to inherent differences in data dimensionality and biological information between modalities. To advance generative models in the fundus-to-3D-OCT setting, the Asia Pacific Tele-Ophthalmology Society (APTOS-2024) organized a challenge titled Artificial Intelligence-based OCT Generation from Fundus Images. This paper details the challenge framework (referred to as APTOS-2024 Challenge), including: the benchmark dataset, evaluation methodology featuring two fidelity metrics-image-based distance (pixel-level OCT B-scan similarity) and video-based distance (semantic-level volumetric consistency), and analysis of top-performing solutions. The challenge attracted 342 participating teams, with 42 preliminary submissions and 9 finalists. Leading methodologies incorporated innovations in hybrid data preprocessing or augmentation (cross-modality collaborative paradigms), pre-training on external ophthalmic imaging datasets, integration of vision foundation models, and model architecture improvement. The APTOS-2024 Challenge is the first benchmark demonstrating the feasibility of fundus-to-3D-OCT synthesis as a potential solution for improving ophthalmic care accessibility in under-resourced healthcare settings, while helping to expedite medical research and clinical applications.