Deep Neural Encoder-Decoder Model to Relate fMRI Brain Activity with Naturalistic Stimuli

📅 2025-07-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of building interpretable deep neural network models for bidirectional (encoding and decoding) modeling of fMRI brain activity under natural movie stimulation. We propose a novel encoder–decoder architecture incorporating temporal dependencies via temporal convolutional layers to align the high frame rate of video stimuli with the low sampling rate of fMRI. The encoder predicts voxel-wise responses in visual cortices (e.g., V1–V4, LOC), while the decoder reconstructs the original visual input. To enhance interpretability, we integrate Gradient-weighted Class Activation Mapping (Grad-CAM) to generate saliency maps identifying brain regions critically driving reconstruction. Experimental results demonstrate accurate prediction of neural responses across hierarchical visual areas and successful reconstruction of perceptually salient features—including edges, contrast gradients, and facial structures—validating the model as an interpretable proxy for cortical visual processing mechanisms.

Technology Category

Application Category

📝 Abstract
We propose an end-to-end deep neural encoder-decoder model to encode and decode brain activity in response to naturalistic stimuli using functional magnetic resonance imaging (fMRI) data. Leveraging temporally correlated input from consecutive film frames, we employ temporal convolutional layers in our architecture, which effectively allows to bridge the temporal resolution gap between natural movie stimuli and fMRI acquisitions. Our model predicts activity of voxels in and around the visual cortex and performs reconstruction of corresponding visual inputs from neural activity. Finally, we investigate brain regions contributing to visual decoding through saliency maps. We find that the most contributing regions are the middle occipital area, the fusiform area, and the calcarine, respectively employed in shape perception, complex recognition (in particular face perception), and basic visual features such as edges and contrasts. These functions being strongly solicited are in line with the decoder's capability to reconstruct edges, faces, and contrasts. All in all, this suggests the possibility to probe our understanding of visual processing in films using as a proxy the behaviour of deep learning models such as the one proposed in this paper.
Problem

Research questions and friction points this paper is trying to address.

Encode and decode fMRI brain activity for naturalistic stimuli
Bridge temporal resolution gap between movie stimuli and fMRI
Identify brain regions contributing to visual decoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end deep neural encoder-decoder model
Temporal convolutional layers for fMRI data
Saliency maps to identify contributing brain regions
🔎 Similar Papers
No similar papers found.
F
Florian David
Neuro-X Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland
M
Michael Chan
Neuro-X Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland; Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
E
Elenor Morgenroth
Neuro-X Institute, Ecole Polytechnique Fédérale de Lausanne (EPFL), Geneva, Switzerland; Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland; Swiss Center for Affective Sciences, University of Geneva, Geneva, Switzerland
Patrik Vuilleumier
Patrik Vuilleumier
University of Geneva
neurosciencebrainneurologycognitionemotion
Dimitri Van De Ville
Dimitri Van De Ville
Professor, Neuro-X Institute, EPF Lausanne and University of Geneva
image processingneurosciencesmedical imagingbrain functiongraph signal processing