🤖 AI Summary
This study addresses the challenge of improving modeling accuracy for cognitive decline trajectories in Alzheimer’s disease (AD) to enable early risk stratification and personalized intervention. To effectively integrate heterogeneous longitudinal clinical measures and structural MRI features, we propose a trajectory-aware annotation strategy: (1) clustering longitudinal cognitive trajectories using Dynamic Time Warping (DTW), and (2) extracting anatomically preserved MRI embeddings via an unsupervised pre-trained 3D Vision Transformer (ViT). We further incorporate image augmentation and multimodal fusion, evaluating performance across both traditional machine learning and deep learning head models. Results demonstrate that the clinical + volumetric feature combination achieves an AUC of 0.70 for predicting mild versus severe progression, while ViT-derived embeddings attain an AUC of 0.71 for identifying cognitively stable individuals. Multimodal integration yields statistically significant improvements over unimodal baselines, confirming complementary information capture; however, prediction of moderate progression remains challenging.
📝 Abstract
Accurate modeling of cognitive decline in Alzheimer's disease is essential for early stratification and personalized management. While tabular predictors provide robust markers of global risk, their ability to capture subtle brain changes remains limited. In this study, we evaluate the predictive contributions of tabular and imaging-based representations, with a focus on transformer-derived Magnetic Resonance Imaging (MRI) embeddings. We introduce a trajectory-aware labeling strategy based on Dynamic Time Warping clustering to capture heterogeneous patterns of cognitive change, and train a 3D Vision Transformer (ViT) via unsupervised reconstruction on harmonized and augmented MRI data to obtain anatomy-preserving embeddings without progression labels. The pretrained encoder embeddings are subsequently assessed using both traditional machine learning classifiers and deep learning heads, and compared against tabular representations and convolutional network baselines. Results highlight complementary strengths across modalities. Clinical and volumetric features achieved the highest AUCs of around 0.70 for predicting mild and severe progression, underscoring their utility in capturing global decline trajectories. In contrast, MRI embeddings from the ViT model were most effective in distinguishing cognitively stable individuals with an AUC of 0.71. However, all approaches struggled in the heterogeneous moderate group. These findings indicate that clinical features excel in identifying high-risk extremes, whereas transformer-based MRI embeddings are more sensitive to subtle markers of stability, motivating multimodal fusion strategies for AD progression modeling.