🤖 AI Summary
This study addresses the limited field of view in conventional color fundus photography for automated screening of diabetic retinopathy (DR) and diabetic macular edema (DME) by systematically evaluating Vision Transformers (ViTs) and foundation models on ultra-widefield (UWF) retinal imaging for the first time. The authors propose a feature-level fusion strategy that integrates spatial-domain (RGB) and frequency-domain representations, enhanced with Grad-CAM to improve model interpretability. The approach demonstrates consistently strong performance across three critical tasks: image quality assessment, referable DR detection, and DME identification. Extensive experiments on the UWF4DR Challenge dataset validate the method’s effectiveness and competitiveness, highlighting the innovative potential of frequency-domain modeling and advanced deep learning architectures in UWF-based retinal analysis.
📝 Abstract
Diabetic retinopathy (DR) and diabetic macular edema (DME) are leading causes of preventable blindness among working-age adults. Traditional approaches in the literature focus on standard color fundus photography (CFP) for the detection of these conditions. Nevertheless, recent ultra-widefield imaging (UWF) offers a significantly wider field of view in comparison to CFP. Motivated by this, the present study explores state-of-the-art deep learning (DL) methods and UWF imaging on three clinically relevant tasks: i) image quality assessment for UWF, ii) identification of referable diabetic retinopathy (RDR), and iii) identification of DME. Using the publicly available UWF4DR Challenge dataset, released as part of the MICCAI 2024 conference, we benchmark DL models in the spatial (RGB) and frequency domains, including popular convolutional neural networks (CNNs) as well as recent vision transformers (ViTs) and foundation models. In addition, we explore a final feature-level fusion to increase robustness. Finally, we also analyze the decisions of the DL models using Grad-CAM, increasing the explainability. Our proposal achieves consistently strong performance across all architectures, underscoring the competitiveness of emerging ViTs and foundation models and the promise of feature-level fusion and frequency-domain representations for UWF analysis.