π€ AI Summary
Existing retinal analysis methods struggle to effectively integrate fundus images with clinical risk factors and lack the capacity to model multimodal joint patterns and inter-patient similarities. This work proposes a unified multimodal representation learning framework that aligns color fundus photographs with individualized disease risk profiles, enabling prediction of Alzheimerβs disease and dementia risk up to eight years prior to clinical diagnosis. The approach innovatively transforms structured risk questionnaires into clinically interpretable textual representations and introduces a Group-Aware Contrastive Learning (GACL) strategy that enhances cross-modal alignment between retinal morphology and risk factors through clustering. The method substantially outperforms existing retinal models, combinations of clinical text encoders, and general-purpose vision-language models, demonstrating exceptional performance in early risk stratification.
π Abstract
The retina provides a unique, noninvasive window into Alzheimer's disease (AD) and dementia, capturing early structural changes through morphometric features, while systemic and lifestyle risk factors reflect well-established contributors to disease susceptibility long before clinical symptom onset. However, current retinal analysis frameworks typically model imaging and risk factors separately, limiting their ability to capture joint multimodal patterns critical for early risk prediction. Moreover, existing methods rarely incorporate mechanisms to organize or align patients with similar retinal and clinical characteristics, constraining the learning of coherent cross-modal associations. To address these limitations, we introduce REVEAL (REtinal-risk Vision-Language Early Alzheimer's Learning), a framework that aligns color fundus photographs with individualized disease-specific risk profiles for predicting incident AD and dementia, on average 8 years before diagnosis (range: 1-11 years). Because real-world risk factors are structured questionnaire data, we translate them into clinically interpretable narratives compatible with pretrained vision-language models (VLMs). We further propose a group-aware contrastive learning (GACL) strategy that clusters patients with similar retinal morphometry and risk factors as positive pairs, strengthening multimodal alignment. This unified representation learning framework substantially outperforms state-of-the-art retinal imaging models paired with clinical text encoders, as well as general-purpose VLMs, demonstrating the value of jointly modeling retinal biomarkers and clinical risk factors. By providing a generalizable and noninvasive approach for early AD and dementia risk stratification, REVEAL has the potential to enable earlier intervention and improve preventive care at the population level.