Decoding fMRI Data into Captions using Prefix Language Modeling

📅 2025-01-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of directly decoding fMRI brain activity signals into natural-language image descriptions to enhance the interpretability and readability of neural decoding outputs. We propose a cross-modal decoding paradigm that avoids contamination from COCO-derived training data: first, a 3D CNN models voxel-wise spatial structure to project fMRI data into the DINOv2 image embedding space; second, the [CLS] token from this embedding serves as a prefix for GPT-2 to generate semantically coherent captions. Unlike conventional linear regression or COCO-dependent models (e.g., GIT), our approach eliminates training-data leakage risks and significantly reduces computational overhead. Evaluated on the Natural Scenes Dataset (NSD), our method achieves improvements in caption semantic relevance, robustness to noise, and readability by middle-school students—marking the first successful integration of DINOv2 visual representations for fMRI-to-text generation.

Technology Category

Application Category

📝 Abstract
With the advancements in Large Language and Latent Diffusion models, brain decoding has achieved remarkable results in recent years. The works on the NSD dataset, with stimuli images from the COCO dataset, leverage the embeddings from the CLIP model for image reconstruction and GIT for captioning. However, the current captioning approach introduces the challenge of potential data contamination given that the GIT model was trained on the COCO dataset. In this work, we present an alternative method for decoding brain signals into image captions by predicting a DINOv2 model's embedding of an image from the corresponding fMRI signal and then providing its [CLS] token as the prefix to the GPT-2 language model which decreases computational requirements considerably. Additionally, instead of commonly used Linear Regression, we explore 3D Convolutional Neural Network mapping of fMRI signals to image embedding space for better accounting positional information of voxels.
Problem

Research questions and friction points this paper is trying to address.

Brain Imaging
Simplification Algorithm
Comprehensible Description
Innovation

Methods, ideas, or system contributions that make the work stand out.

DINOv2
3D Convolutional Neural Network
GPT-2
🔎 Similar Papers
No similar papers found.