🤖 AI Summary
This work addresses key limitations of existing structural MRI (sMRI) representation methods—namely high computational cost, lack of cross-slice contextual integration, and insufficient discriminative power—by proposing a Multimodal Visual Surrogate Compression (MVSC) framework. MVSC efficiently compresses 3D sMRI volumes into compact 2D visual surrogates compatible with frozen 2D foundation models such as DINO. The approach introduces two core innovations: a text-guided 3D volumetric context encoder and an adaptive slice fusion module, which together preserve global anatomical structure while enabling highly efficient compression. Evaluated on three large-scale Alzheimer’s disease datasets, the method consistently outperforms state-of-the-art approaches in both binary and multi-class classification tasks.
📝 Abstract
High-dimensional structural MRI (sMRI) images are widely used for Alzheimer's Disease (AD) diagnosis. Most existing methods for sMRI representation learning rely on 3D architectures (e.g., 3D CNNs), slice-wise feature extraction with late aggregation, or apply training-free feature extractions using 2D foundation models (e.g., DINO). However, these three paradigms suffer from high computational cost, loss of cross-slice relations, and limited ability to extract discriminative features, respectively. To address these challenges, we propose Multimodal Visual Surrogate Compression (MVSC). It learns to compress and adapt large 3D sMRI volumes into compact 2D features, termed as visual surrogates, which are better aligned with frozen 2D foundation models to extract powerful representations for final AD classification. MVSC has two key components: a Volume Context Encoder that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner. Extensive experiments on three large-scale Alzheimer's disease benchmarks demonstrate our MVSC performs favourably on both binary and multi-class classification tasks compared against state-of-the-art methods.