🤖 AI Summary
This study addresses the limitations in performance and interpretability of self-supervised representation learning for resting-state fMRI by proposing the Rhamba framework. Rhamba uniquely integrates anatomy-guided, region-aware masking strategies—Any, Majority, and Pure—with a hybrid Attention-Mamba architecture and introduces region-aligned patch embeddings. Pretrained on ABIDE and fine-tuned on COBRE and ADHD-200 datasets for psychiatric disorder classification, Rhamba significantly outperforms existing methods, with its Mamba-Attention hybrid variant achieving the highest average AUROC. Integrated Gradients analysis further demonstrates that model performance hinges on the synergistic interplay between the masking strategy and architectural design, thereby enhancing both scalability and the interpretability of neuroimaging representations.
📝 Abstract
Self-supervised pretraining is promising for large-scale neuroimaging, yet the impact of region-aware masking and hybrid sequence modeling remains underexplored. In this work, we introduce Rhamba, a region-aware pretraining framework that integrates anatomically guided masking with hybrid Attention-Mamba architectures for resting state functional magnetic resonance imaging (fMRI) analysis. Models were pretrained on the ABIDE dataset using region-aligned patch embeddings and three masking strategies (Any, Majority, and Pure) with increasing spatial specificity. We evaluated four architectural variants: a Mamba only model, an Alternate architecture with interleaved Mamba and Attention blocks, and two hybrid encoder-decoder configurations (Attention-Mamba (AM) and Mamba-Attention (MA)). The pretrained models were fine-tuned on downstream classification tasks using the COBRE and ADHD-200 datasets for schizophrenia and attention-deficit/hyperactivity disorder discrimination. We employed Integrated Gradients, an explainable AI method, to identify the brain regions contributing to model predictions. Masking strategy strongly influenced reconstruction behavior, with reconstruction loss following a consistent ordering (Any > Majority > Pure). However, this trend did not directly translate into downstream performance, where differences were modest and dataset-dependent. The hybrid architecture with the MA configuration achieved the highest average AUROC across both datasets, and Rhamba outperformed state-of-the-art methods in comparative evaluation. Region-wise analysis showed that peak performance depends on the interaction between masking strategy and architecture rather than a single dominant configuration. Overall, Rhamba offers a flexible framework for balancing interpretability, scalability, and performance in large-scale fMRI representation learning.