Robust Incomplete-Modality Alignment for Ophthalmic Disease Grading and Diagnosis via Labeled Optimal Transport

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In multimodal ophthalmic diagnosis, unequal access to medical resources often leads to missing modalities (e.g., fundus photography or OCT alone), undermining model performance. Existing modality completion and knowledge distillation methods suffer from lesion reconstruction artifacts and strong reliance on fully paired multimodal data. Method: We propose a robust cross-modal alignment framework based on labeled optimal transport. It introduces a novel class-level prototype-guided semantic alignment mechanism, integrated with asymmetric cross-modal feature sharing and label-driven soft matching, enabling multiscale complementary information fusion under modality absence. Contribution/Results: Evaluated on three large-scale multimodal ophthalmic datasets, our method achieves state-of-the-art performance in both complete and missing-modality settings. It significantly improves robustness and generalizability in disease grading—particularly under real-world data scarcity and modality imbalance—without requiring full modality pairing or pixel-level reconstruction.

Technology Category

Application Category

📝 Abstract
Multimodal ophthalmic imaging-based diagnosis integrates color fundus image with optical coherence tomography (OCT) to provide a comprehensive view of ocular pathologies. However, the uneven global distribution of healthcare resources often results in real-world clinical scenarios encountering incomplete multimodal data, which significantly compromises diagnostic accuracy. Existing commonly used pipelines, such as modality imputation and distillation methods, face notable limitations: 1)Imputation methods struggle with accurately reconstructing key lesion features, since OCT lesions are localized, while fundus images vary in style. 2)distillation methods rely heavily on fully paired multimodal training data. To address these challenges, we propose a novel multimodal alignment and fusion framework capable of robustly handling missing modalities in the task of ophthalmic diagnostics. By considering the distinctive feature characteristics of OCT and fundus images, we emphasize the alignment of semantic features within the same category and explicitly learn soft matching between modalities, allowing the missing modality to utilize existing modality information, achieving robust cross-modal feature alignment under the missing modality. Specifically, we leverage the Optimal Transport for multi-scale modality feature alignment: class-wise alignment through predicted class prototypes and feature-wise alignment via cross-modal shared feature transport. Furthermore, we propose an asymmetric fusion strategy that effectively exploits the distinct characteristics of OCT and fundus modalities. Extensive evaluations on three large ophthalmic multimodal datasets demonstrate our model's superior performance under various modality-incomplete scenarios, achieving Sota performance in both complete modality and inter-modality incompleteness conditions. Code is available at https://github.com/Qinkaiyu/RIMA
Problem

Research questions and friction points this paper is trying to address.

Handles incomplete multimodal ophthalmic data for diagnosis
Aligns OCT and fundus features despite missing modalities
Improves diagnostic accuracy with optimal transport alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal Transport for multi-scale feature alignment
Class-wise and feature-wise modality alignment
Asymmetric fusion of OCT and fundus features
🔎 Similar Papers
No similar papers found.
Qinkai Yu
Qinkai Yu
University of Exeter
Medical Image AnalysisComputer VisionLarge Language Models
J
Jianyang Xie
Eye and Vision Sciences Department, University of Liverpool, Liverpool, UK.
Yitian Zhao
Yitian Zhao
Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences
Medical Imagingcomputer visionpattern recognition
C
Cheng Chen
Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong, China.
L
Lijun Zhang
Key Laboratory of System Software, Institute of Software, Chinese Academy of Sciences, Beijing, China.
L
Liming Chen
School of Computer Science and Technology, Dalian University of Technology, Dalian, China.
J
Jun Cheng
Institute for Infocomm Research, A*STAR, Singapore.
L
Lu Liu
Computer Science Department, University of Exeter, Exeter, UK.
Yalin Zheng
Yalin Zheng
University of Liverpool
image processingcomputer visionmachine learning and medical image analysis
Yanda Meng
Yanda Meng
University of Exeter
Medical Image Analysis