Stacked Regression using Off-the-shelf, Stimulus-tuned and Fine-tuned Neural Networks for Predicting fMRI Brain Responses to Movies (Algonauts 2025 Report)

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses cross-subject prediction of fMRI brain responses to movie stimuli. We propose a multimodal representation fusion and ensemble learning framework that jointly leverages large language models (LLMs), video encoders, audio encoders, and vision-language models (VLMs). To enrich textual semantics, we incorporate movie transcripts and summaries as auxiliary inputs. Modality alignment is optimized via stimulus-specific tuning and staged fine-tuning strategies. Predictions from individual modality-specific regressors are integrated using stacked regression. Evaluated on the Algonauts 2025 Challenge, our framework ranks 10th globally (Top 12%), significantly outperforming baseline unimodal approaches. All code and preprocessed resources are publicly released, establishing a reproducible paradigm for multimodal neural decoding and interpretable modeling of brain activity.

Technology Category

Application Category

📝 Abstract

We present our submission to the Algonauts 2025 Challenge, where the goal is to predict fMRI brain responses to movie stimuli. Our approach integrates multimodal representations from large language models, video encoders, audio models, and vision-language models, combining both off-the-shelf and fine-tuned variants. To improve performance, we enhanced textual inputs with detailed transcripts and summaries, and we explored stimulus-tuning and fine-tuning strategies for language and vision models. Predictions from individual models were combined using stacked regression, yielding solid results. Our submission, under the team name Seinfeld, ranked 10th. We make all code and resources publicly available, contributing to ongoing efforts in developing multimodal encoding models for brain activity.

Problem

Research questions and friction points this paper is trying to address.

Predict fMRI brain responses to movie stimuli

Integrate multimodal representations from neural networks

Combine predictions using stacked regression techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combined multimodal representations from multiple neural networks

Enhanced textual inputs with detailed transcripts and summaries

Used stacked regression to fuse individual model predictions

🔎 Similar Papers

No similar papers found.

Authors to Follow