π€ AI Summary
This study addresses the challenge of precise glaucoma staging, which is hindered by the scarcity of multimodal imaging data and difficulties in cross-modal fusion. To this end, we introduce GLEAM, the first publicly available trimodal glaucoma dataset encompassing fundus photographs, optical coherence tomography (OCT) scans, and visual field maps. We further propose a Hierarchical Attention Masked Modeling (HAMM) framework that leverages a lightweight encoder to jointly learn visual, structural, and functional representations. By effectively capturing complementary information across modalities, HAMM significantly improves accuracy in four-stage glaucoma classification, thereby providing both a high-quality data foundation and an efficient algorithmic solution to support clinical diagnosis.
π Abstract
We propose glaucoma lesion evaluation and analysis with multimodal imaging (GLEAM), the first publicly available tri-modal glaucoma dataset comprising scanning laser ophthalmoscopy fundus images, circumpapillary OCT images, and visual field pattern deviation maps, annotated with four disease stages, enabling effective exploitation of multimodal complementary information and facilitating accurate diagnosis and treatment across disease stages. To effectively integrate cross-modal information, we propose hierarchical attentive masked modeling (HAMM) for multimodal glaucoma classification. Our framework employs hierarchical attentive encoders and light decoders to focus cross-modal representation learning on the encoder.