🤖 AI Summary
Existing neuroimaging foundation models suffer from excessive parameter counts, heavy reliance on large-scale datasets, and limited generalizability and deployability. This paper introduces BrainSymphony, a lightweight multimodal foundation model that enables efficient pretraining using only small-scale publicly available data. Methodologically, it innovatively integrates a dual-stream spatiotemporal Transformer for fMRI with a symbolic graph Transformer for dMRI, augmented by Perceiver-based representation distillation and a cross-modal adaptive fusion gating mechanism—achieving joint functional-structural modeling within a compact parameter budget. Empirically, BrainSymphony outperforms leading large-scale models across classification, prediction, and unsupervised brain network identification tasks. Furthermore, attention visualization on psilocybin fMRI data reveals novel dynamic brain patterns. This work establishes a new paradigm for interpretable and deployable neuroimaging AI.
📝 Abstract
Existing foundation models for neuroimaging are often prohibitively large and data-intensive. We introduce BrainSymphony, a lightweight, parameter-efficient foundation model that achieves state-of-the-art performance while being pre-trained on significantly smaller public datasets. BrainSymphony's strong multimodal architecture processes functional MRI data through parallel spatial and temporal transformer streams, which are then efficiently distilled into a unified representation by a Perceiver module. Concurrently, it models structural connectivity from diffusion MRI using a novel signed graph transformer to encode the brain's anatomical structure. These powerful, modality-specific representations are then integrated via an adaptive fusion gate. Despite its compact design, our model consistently outperforms larger models on a diverse range of downstream benchmarks, including classification, prediction, and unsupervised network identification tasks. Furthermore, our model revealed novel insights into brain dynamics using attention maps on a unique external psilocybin neuroimaging dataset (pre- and post-administration). BrainSymphony establishes that architecturally-aware, multimodal models can surpass their larger counterparts, paving the way for more accessible and powerful research in computational neuroscience.