🤖 AI Summary
In multimodal ophthalmic diagnosis, optical coherence tomography (OCT) imaging is costly, fundus-OCT paired data are scarce, and unimodal methods struggle to model fine-grained spatial lesion distributions. Method: We propose an unpaired multimodal learning framework that constructs OCT-derived lesion spatial preference matrices—encoding disease-specific anatomical region affinities—and explicitly models these via contrastive learning in the OCT latent space. Disease textual descriptions serve as semantic bridges to align fundus images with OCT spatial priors without requiring paired data. Contribution/Results: The framework provides dynamic spatial guidance for fundus image classification. Evaluated on nine datasets covering 28 ocular diseases, it significantly outperforms state-of-the-art unimodal and multimodal baselines, enhancing both diagnostic accuracy and interpretability.
📝 Abstract
Significant advancements in AI-driven multimodal medical image diagnosis have led to substantial improvements in ophthalmic disease identification in recent years. However, acquiring paired multimodal ophthalmic images remains prohibitively expensive. While fundus photography is simple and cost-effective, the limited availability of OCT data and inherent modality imbalance hinder further progress. Conventional approaches that rely solely on fundus or textual features often fail to capture fine-grained spatial information, as each imaging modality provides distinct cues about lesion predilection sites. In this study, we propose a novel unpaired multimodal framework UOPSL that utilizes extensive OCT-derived spatial priors to dynamically identify predilection sites, enhancing fundus image-based disease recognition. Our approach bridges unpaired fundus and OCTs via extended disease text descriptions. Initially, we employ contrastive learning on a large corpus of unpaired OCT and fundus images while simultaneously learning the predilection sites matrix in the OCT latent space. Through extensive optimization, this matrix captures lesion localization patterns within the OCT feature space. During the fine-tuning or inference phase of the downstream classification task based solely on fundus images, where paired OCT data is unavailable, we eliminate OCT input and utilize the predilection sites matrix to assist in fundus image classification learning. Extensive experiments conducted on 9 diverse datasets across 28 critical categories demonstrate that our framework outperforms existing benchmarks.