🤖 AI Summary
This study addresses early diagnosis of Alzheimer’s disease (AD) by proposing a lightweight, efficient dual-modality fusion framework integrating structural MRI (sMRI) and Jacobian deformation maps (JSM). To tackle modality heterogeneity, we introduce— for the first time—a 3D cross-modal cross-attention mechanism that explicitly models intrinsic correlations between regional structural intensity and volumetric deformation. Feature alignment and compression are achieved via a pretrained Swin UNETR encoder and Jacobian determinant mapping. The resulting model contains only 1.56M parameters—over 40× smaller than mainstream architectures. On the ADNI dataset, it achieves ROC-AUC scores of 0.903±0.033 for AD vs. cognitively normal (CN) classification and 0.692±0.061 for mild cognitive impairment (MCI) vs. CN—significantly outperforming self-attention baselines. Our core contribution is the first sMRI–JSM cross-attention fusion paradigm, uniquely balancing high diagnostic accuracy, strong interpretability, and minimal computational overhead.
📝 Abstract
Early diagnosis of Alzheimer's disease (AD) is critical for intervention before irreversible neurodegeneration occurs. Structural MRI (sMRI) is widely used for AD diagnosis, but conventional deep learning approaches primarily rely on intensity-based features, which require large datasets to capture subtle structural changes. Jacobian determinant maps (JSM) provide complementary information by encoding localized brain deformations, yet existing multimodal fusion strategies fail to fully integrate these features with sMRI. We propose a cross-attention fusion framework to model the intrinsic relationship between sMRI intensity and JSM-derived deformations for AD classification. Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, we compare cross-attention, pairwise self-attention, and bottleneck attention with four pre-trained 3D image encoders. Cross-attention fusion achieves superior performance, with mean ROC-AUC scores of 0.903 (+/-0.033) for AD vs. cognitively normal (CN) and 0.692 (+/-0.061) for mild cognitive impairment (MCI) vs. CN. Despite its strong performance, our model remains highly efficient, with only 1.56 million parameters--over 40 times fewer than ResNet-34 (63M) and Swin UNETR (61.98M). These findings demonstrate the potential of cross-attention fusion for improving AD diagnosis while maintaining computational efficiency.