fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

fMRI–text paired data are extremely scarce, hindering the development of general-purpose, cross-modal alignment models between neural activity and semantic cognition. Method: We propose a three-stage framework: (1) a neuro-tokenizer that maps fMRI voxel sequences into language-like tokens; (2) construction of a large-scale descriptive fMRI corpus to bridge the modality gap; and (3) end-to-end sequence-level fMRI-to-language alignment via LLM integration, LoRA-efficient fine-tuning, and multi-task instruction tuning. Contribution/Results: This work introduces the first general-purpose fMRI–language foundation model enabling deep semantic coupling between fMRI signals and large language models. It achieves state-of-the-art performance on zero-shot and few-shot fMRI decoding, as well as semantic retrieval benchmarks—significantly outperforming prior methods. Our approach establishes a scalable, generalizable paradigm for brain–language interfaces, advancing foundational modeling of neural semantics.

Technology Category

Application Category

📝 Abstract

Recent advances in multimodal large language models (LLMs) have enabled unified reasoning across images, audio, and video, but extending such capability to brain imaging remains largely unexplored. Bridging this gap is essential to link neural activity with semantic cognition and to develop cross-modal brain representations. To this end, we present fMRI-LM, a foundational model that bridges functional MRI (fMRI) and language through a three-stage framework. In Stage 1, we learn a neural tokenizer that maps fMRI into discrete tokens embedded in a language-consistent space. In Stage 2, a pretrained LLM is adapted to jointly model fMRI tokens and text, treating brain activity as a sequence that can be temporally predicted and linguistically described. To overcome the lack of natural fMRI-text pairs, we construct a large descriptive corpus that translates diverse imaging-based features into structured textual descriptors, capturing the low-level organization of fMRI signals. In Stage 3, we perform multi-task, multi-paradigm instruction tuning to endow fMRI-LM with high-level semantic understanding, supporting diverse downstream applications. Across various benchmarks, fMRI-LM achieves strong zero-shot and few-shot performance, and adapts efficiently with parameter-efficient tuning (LoRA), establishing a scalable pathway toward a language-aligned, universal model for structural and semantic understanding of fMRI.

Problem

Research questions and friction points this paper is trying to address.

Bridging fMRI and language for semantic cognition

Mapping brain activity into language-consistent discrete tokens

Enabling universal fMRI understanding via multi-task instruction tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural tokenizer maps fMRI to language-consistent tokens

Pretrained LLM jointly models fMRI tokens and text

Multi-task instruction tuning enables high-level semantic understanding

🔎 Similar Papers

No similar papers found.

Authors to Follow