π€ AI Summary
Predicting conversion from mild cognitive impairment (MCI) to Alzheimerβs disease (AD) remains challenging due to frequent PET modality incompleteness, strong inter-modal heterogeneity, and inefficient multimodal fusion. To address these issues, we propose a robust and interpretable incomplete trimodal fusion framework. Our method introduces a novel MRI-guided PET synthesis module to impute missing PET scans, employs modality-specific encoders with channel-wise aggregation, and incorporates a trimodal cooperative attention mechanism alongside a cross-modal alignment loss to jointly optimize structural information integration and feature alignment. Evaluated on the ADNI1/2 datasets, our approach consistently outperforms unimodal baselines and state-of-the-art multimodal methods, achieving up to a 5.2% improvement in AUC. The source code is publicly available.
π Abstract
Alzheimer's disease (AD) is a common neurodegenerative disease among the elderly. Early prediction and timely intervention of its prodromal stage, mild cognitive impairment (MCI), can decrease the risk of advancing to AD. Combining information from various modalities can significantly improve predictive accuracy. However, challenges such as missing data and heterogeneity across modalities complicate multimodal learning methods as adding more modalities can worsen these issues. Current multimodal fusion techniques often fail to adapt to the complexity of medical data, hindering the ability to identify relationships between modalities. To address these challenges, we propose an innovative multimodal approach for predicting MCI conversion, focusing specifically on the issues of missing positron emission tomography (PET) data and integrating diverse medical information. The proposed incomplete triple-modal MCI conversion prediction network is tailored for this purpose. Through the missing modal generation module, we synthesize the missing PET data from the magnetic resonance imaging and extract features using specifically designed encoders. We also develop a channel aggregation module and a triple-modal co-attention fusion module to reduce feature redundancy and achieve effective multimodal data fusion. Furthermore, we design a loss function to handle missing modality issues and align cross-modal features. These components collectively harness multimodal data to boost network performance. Experimental results on the ADNI1 and ADNI2 datasets show that our method significantly surpasses existing unimodal and other multimodal models. Our code is available at https://github.com/justinhxy/ITFC.