MIRAGE: Adaptive Multimodal Gating for Whole-Brain fMRI Encoding

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limitation of existing brain encoding models that rely on unimodal representations and struggle to effectively integrate visual, auditory, and linguistic information for predicting whole-brain responses to natural audiovisual stimuli. To overcome this, the authors propose the MIRAGE framework, which leverages a native multimodal backbone network equipped with an inter-layer adaptive gating mechanism, a Transformer-based brain encoder, and subject-specific linear readout heads informed by cortical parcellation. The approach achieves state-of-the-art performance in whole-brain fMRI response prediction, demonstrating that native multimodal representations outperform post-hoc fusion strategies. Furthermore, interpretable gating weights reveal distinct cortical activation patterns associated with each sensory modality, offering insights into how multimodal information is differentially processed across the brain.
📝 Abstract
Recent progress in task-optimized neural networks has established encoding models as a powerful tool for predicting brain responses to naturalistic stimuli, yet most existing approaches rely on unimodal representations. The emergence of omni-modal foundation models and rich multimodal neural datasets enables encoding models that jointly integrate visual, auditory, and linguistic information across subjects. We introduce MIRAGE, a brain encoding framework for predicting whole-brain fMRI responses to naturalistic audiovisual stimuli. MIRAGE achieves state-of-the-art performance via a native multimodal backbone and adaptive feature gating across layers. These representations are then combined with a transformer-based brain encoder and a subject-specific linear head over the cortical parcels. Controlled comparisons show that natively multimodal features consistently outperform post-hoc aggregation of independent unimodal features, across architectural levels and backbones. Beyond predictive accuracy, the learned attention weights are directly inspectable to interpret the modality-specific gating profile over the backbone, and each modality traces a distinct anatomical pattern across cortex. Together, these results propose adaptive layer-wise aggregation of natively multimodal features as a generalizable, interpretable, and accurate approach for whole-brain encoding.
Problem

Research questions and friction points this paper is trying to address.

multimodal integration
brain encoding
fMRI
naturalistic stimuli
whole-brain prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal gating
whole-brain fMRI encoding
adaptive feature integration
foundation models
interpretable attention
💼 Related Jobs