🤖 AI Summary
fMRI–text paired data are extremely scarce, hindering the development of general-purpose, cross-modal alignment models between neural activity and semantic cognition.
Method: We propose a three-stage framework: (1) a neuro-tokenizer that maps fMRI voxel sequences into language-like tokens; (2) construction of a large-scale descriptive fMRI corpus to bridge the modality gap; and (3) end-to-end sequence-level fMRI-to-language alignment via LLM integration, LoRA-efficient fine-tuning, and multi-task instruction tuning.
Contribution/Results: This work introduces the first general-purpose fMRI–language foundation model enabling deep semantic coupling between fMRI signals and large language models. It achieves state-of-the-art performance on zero-shot and few-shot fMRI decoding, as well as semantic retrieval benchmarks—significantly outperforming prior methods. Our approach establishes a scalable, generalizable paradigm for brain–language interfaces, advancing foundational modeling of neural semantics.
📝 Abstract
Recent advances in multimodal large language models (LLMs) have enabled unified reasoning across images, audio, and video, but extending such capability to brain imaging remains largely unexplored. Bridging this gap is essential to link neural activity with semantic cognition and to develop cross-modal brain representations. To this end, we present fMRI-LM, a foundational model that bridges functional MRI (fMRI) and language through a three-stage framework. In Stage 1, we learn a neural tokenizer that maps fMRI into discrete tokens embedded in a language-consistent space. In Stage 2, a pretrained LLM is adapted to jointly model fMRI tokens and text, treating brain activity as a sequence that can be temporally predicted and linguistically described. To overcome the lack of natural fMRI-text pairs, we construct a large descriptive corpus that translates diverse imaging-based features into structured textual descriptors, capturing the low-level organization of fMRI signals. In Stage 3, we perform multi-task, multi-paradigm instruction tuning to endow fMRI-LM with high-level semantic understanding, supporting diverse downstream applications. Across various benchmarks, fMRI-LM achieves strong zero-shot and few-shot performance, and adapts efficiently with parameter-efficient tuning (LoRA), establishing a scalable pathway toward a language-aligned, universal model for structural and semantic understanding of fMRI.