🤖 AI Summary
Early diagnosis of glaucoma is often hindered by insufficient information from a single imaging modality, leading to missed detections. To address this challenge, this work proposes an Iterative Multimodal Optimization (IMO) model that, for the first time, incorporates a denoising diffusion mechanism into multimodal joint glaucoma diagnosis. The model employs an intermediate fusion strategy to integrate fundus and OCT images, introduces a cross-modal feature alignment module to mitigate modality discrepancies, and simultaneously refines optic disc/cup segmentation and glaucoma classification within an iterative refinement decoder. This synergistic design enables mutual enhancement between segmentation and classification tasks, achieving state-of-the-art performance on both. The approach significantly improves multimodal feature integration and offers a reliable, comprehensive assessment framework for clinical use.
📝 Abstract
Accurate diagnosis of glaucoma is challenging, as early-stage changes are subtle and often lack clear structural or appearance cues. Most existing approaches rely on a single modality, such as fundus or optical coherence tomography (OCT), capturing only partial pathological information and often missing early disease progression. In this paper, we propose an iterative multimodal optimization model (IMO) for joint segmentation and grading. IMO integrates fundus and OCT features through a mid-level fusion strategy, enhanced by a cross-modal feature alignment (CMFA) module to reduce modality discrepancies. An iterative refinement decoder progressively optimizes the multimodal features through a denoising diffusion mechanism, enabling fine-grained segmentation of the optic disc and cup while supporting accurate glaucoma grading.
Extensive experiments show that our method effectively integrates multimodal features, providing a comprehensive and clinically significant approach to glaucoma assessment. Source codes are available at https://github.com/warren-wzw/IMO.git.