MIRAGE: Adaptive Multimodal Gating for Whole-Brain fMRI Encoding

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This study addresses the limitation of existing brain encoding models that rely on unimodal representations and struggle to effectively integrate visual, auditory, and linguistic information for predicting whole-brain responses to natural audiovisual stimuli. To overcome this, the authors propose the MIRAGE framework, which leverages a native multimodal backbone network equipped with an inter-layer adaptive gating mechanism, a Transformer-based brain encoder, and subject-specific linear readout heads informed by cortical parcellation. The approach achieves state-of-the-art performance in whole-brain fMRI response prediction, demonstrating that native multimodal representations outperform post-hoc fusion strategies. Furthermore, interpretable gating weights reveal distinct cortical activation patterns associated with each sensory modality, offering insights into how multimodal information is differentially processed across the brain.

📝 Abstract

Recent progress in task-optimized neural networks has established encoding models as a powerful tool for predicting brain responses to naturalistic stimuli, yet most existing approaches rely on unimodal representations. The emergence of omni-modal foundation models and rich multimodal neural datasets enables encoding models that jointly integrate visual, auditory, and linguistic information across subjects. We introduce MIRAGE, a brain encoding framework for predicting whole-brain fMRI responses to naturalistic audiovisual stimuli. MIRAGE achieves state-of-the-art performance via a native multimodal backbone and adaptive feature gating across layers. These representations are then combined with a transformer-based brain encoder and a subject-specific linear head over the cortical parcels. Controlled comparisons show that natively multimodal features consistently outperform post-hoc aggregation of independent unimodal features, across architectural levels and backbones. Beyond predictive accuracy, the learned attention weights are directly inspectable to interpret the modality-specific gating profile over the backbone, and each modality traces a distinct anatomical pattern across cortex. Together, these results propose adaptive layer-wise aggregation of natively multimodal features as a generalizable, interpretable, and accurate approach for whole-brain encoding.

Problem

Research questions and friction points this paper is trying to address.

multimodal integration

brain encoding

fMRI

naturalistic stimuli

whole-brain prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal gating

whole-brain fMRI encoding

adaptive feature integration