π€ AI Summary
Robust multimodal neuroimaging fusion is urgently needed for early Alzheimerβs disease (AD) diagnosis. Existing MRI-PET fusion models overemphasize cross-modal complementarity while neglecting modality-specific discriminative features and failing to mitigate representation bias induced by inherent inter-modal distribution shifts. To address these limitations, we propose a Consistency-Guided Collaborative Attention Fusion framework (CA-CF), featuring: (1) a shared-independent dual-path encoder that jointly captures cross-modal commonalities and modality-specific characteristics; (2) a learnable parameterized representation module that explicitly compensates for missing modality information; and (3) a latent-space consistency-based distribution alignment mechanism that jointly optimizes feature distribution matching and classification objectives. Evaluated on the ADNI dataset, CA-CF significantly outperforms state-of-the-art multimodal methods in three-class classification (AD/MCI/NC), achieving superior accuracy and generalizability. The framework offers a more reliable and interpretable fusion paradigm for clinical early diagnosis.
π Abstract
Alzheimer's disease (AD) is the most prevalent form of dementia, and its early diagnosis is essential for slowing disease progression. Recent studies on multimodal neuroimaging fusion using MRI and PET have achieved promising results by integrating multi-scale complementary features. However, most existing approaches primarily emphasize cross-modal complementarity while overlooking the diagnostic importance of modality-specific features. In addition, the inherent distributional differences between modalities often lead to biased and noisy representations, degrading classification performance. To address these challenges, we propose a Collaborative Attention and Consistent-Guided Fusion framework for MRI and PET based AD diagnosis. The proposed model introduces a learnable parameter representation (LPR) block to compensate for missing modality information, followed by a shared encoder and modality-independent encoders to preserve both shared and specific representations. Furthermore, a consistency-guided mechanism is employed to explicitly align the latent distributions across modalities. Experimental results on the ADNI dataset demonstrate that our method achieves superior diagnostic performance compared with existing fusion strategies.