Stacked Regression using Off-the-shelf, Stimulus-tuned and Fine-tuned Neural Networks for Predicting fMRI Brain Responses to Movies (Algonauts 2025 Report)

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses cross-subject prediction of fMRI brain responses to movie stimuli. We propose a multimodal representation fusion and ensemble learning framework that jointly leverages large language models (LLMs), video encoders, audio encoders, and vision-language models (VLMs). To enrich textual semantics, we incorporate movie transcripts and summaries as auxiliary inputs. Modality alignment is optimized via stimulus-specific tuning and staged fine-tuning strategies. Predictions from individual modality-specific regressors are integrated using stacked regression. Evaluated on the Algonauts 2025 Challenge, our framework ranks 10th globally (Top 12%), significantly outperforming baseline unimodal approaches. All code and preprocessed resources are publicly released, establishing a reproducible paradigm for multimodal neural decoding and interpretable modeling of brain activity.

Technology Category

Application Category

📝 Abstract
We present our submission to the Algonauts 2025 Challenge, where the goal is to predict fMRI brain responses to movie stimuli. Our approach integrates multimodal representations from large language models, video encoders, audio models, and vision-language models, combining both off-the-shelf and fine-tuned variants. To improve performance, we enhanced textual inputs with detailed transcripts and summaries, and we explored stimulus-tuning and fine-tuning strategies for language and vision models. Predictions from individual models were combined using stacked regression, yielding solid results. Our submission, under the team name Seinfeld, ranked 10th. We make all code and resources publicly available, contributing to ongoing efforts in developing multimodal encoding models for brain activity.
Problem

Research questions and friction points this paper is trying to address.

Predict fMRI brain responses to movie stimuli
Integrate multimodal representations from neural networks
Combine predictions using stacked regression techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combined multimodal representations from multiple neural networks
Enhanced textual inputs with detailed transcripts and summaries
Used stacked regression to fuse individual model predictions
🔎 Similar Papers
No similar papers found.
R
Robert Scholz
Université Paris Cité, Universität Leipzig, Max Planck School of Cognition
K
Kunal Bagga
Indep. Researcher
Christine Ahrends
Christine Ahrends
Junior Research Fellow, Linacre College/FMRIB, University of Oxford
C
Carlo Alberto Barbano
University of Turin