Mind the Gap: Aligning the Brain with Language Models Requires a Nonlinear and Multimodal Approach

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address insufficient accuracy in predicting neural responses to speech, this paper introduces the first nonlinear multimodal prediction framework integrating both acoustic and linguistic features. Moving beyond conventional linear unimodal modeling, it pioneers the nonlinear cross-modal alignment and fusion of hierarchical representations from pretrained large models—LLaMA (semantic) and Whisper (acoustic)—to systematically uncover how auditory and semantic information are jointly encoded across motor, somatosensory, and higher-order language-related brain regions. The method combines nonlinear regression with multimodal feature alignment. In fMRI-based neural response prediction, it achieves a 17.2% improvement (unnormalized) and 17.9% gain (normalized correlation) over traditional linear unimodal baselines, and outperforms current state-of-the-art models by 7.7–14.4%. This work establishes a novel paradigm for brain–language interfaces and neurosemantic modeling.

Technology Category

Application Category

📝 Abstract

Self-supervised language and audio models effectively predict brain responses to speech. However, traditional prediction models rely on linear mappings from unimodal features, despite the complex integration of auditory signals with linguistic and semantic information across widespread brain networks during speech comprehension. Here, we introduce a nonlinear, multimodal prediction model that combines audio and linguistic features from pre-trained models (e.g., LLAMA, Whisper). Our approach achieves a 17.2% and 17.9% improvement in prediction performance (unnormalized and normalized correlation) over traditional unimodal linear models, as well as a 7.7% and 14.4% improvement, respectively, over prior state-of-the-art models. These improvements represent a major step towards future robust in-silico testing and improved decoding performance. They also reveal how auditory and semantic information are fused in motor, somatosensory, and higher-level semantic regions, aligning with existing neurolinguistic theories. Overall, our work highlights the often neglected potential of nonlinear and multimodal approaches to brain modeling, paving the way for future studies to embrace these strategies in naturalistic neurolinguistics research.

Problem

Research questions and friction points this paper is trying to address.

Nonlinear multimodal model for brain alignment

Improving speech comprehension prediction accuracy

Integrating auditory and semantic information in brain networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Nonlinear multimodal prediction model

Combines audio and linguistic features

Improves brain response prediction accuracy

🔎 Similar Papers

No similar papers found.

Authors to Follow