From Eye to Mind: brain2text Decoding Reveals the Neural Mechanisms of Visual Semantic Processing

📅 2025-03-15

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This study addresses two key limitations in fMRI-based brain decoding: overreliance on visual reconstruction and insufficient neuroscientific interpretation of semantic processing. We propose the first end-to-end fMRI-to-text semantic decoding paradigm that bypasses intermediate visual reconstruction entirely. Methodologically, we integrate cross-modal semantic alignment training, deep learning–based fMRI signal modeling, and fine-grained neuroanatomical localization analysis. Key contributions include: (1) the first systematic demonstration of cooperative involvement of MT+, ventral visual cortex, and inferior parietal lobule in vision-to-semantic transformation; (2) empirical validation of separable neural representations for high-level semantic dimensions—including animacy and motion; and (3) state-of-the-art (SOTA) semantic decoding performance, with generated text accurately capturing scene-level semantics. Our framework establishes a high-fidelity, interpretable observational pathway for investigating cortical semantic encoding mechanisms.

Technology Category

Application Category

📝 Abstract

Deciphering the neural mechanisms that transform sensory experiences into meaningful semantic representations is a fundamental challenge in cognitive neuroscience. While neuroimaging has mapped a distributed semantic network, the format and neural code of semantic content remain elusive, particularly for complex, naturalistic stimuli. Traditional brain decoding, focused on visual reconstruction, primarily captures low-level perceptual features, missing the deeper semantic essence guiding human cognition. Here, we introduce a paradigm shift by directly decoding fMRI signals into textual descriptions of viewed natural images. Our novel deep learning model, trained without visual input, achieves state-of-the-art semantic decoding performance, generating meaningful captions that capture the core semantic content of complex scenes. Neuroanatomical analysis reveals the critical role of higher-level visual regions, including MT+, ventral stream visual cortex, and inferior parietal cortex, in this semantic transformation. Category-specific decoding further demonstrates nuanced neural representations for semantic dimensions like animacy and motion. This text-based decoding approach provides a more direct and interpretable window into the brain's semantic encoding than visual reconstruction, offering a powerful new methodology for probing the neural basis of complex semantic processing, refining our understanding of the distributed semantic network, and potentially inspiring brain-inspired language models.

Problem

Research questions and friction points this paper is trying to address.

Decoding neural activity into visual semantic content

Integrating brain decoding with neuroscientific theories systematically

Exploring neural mechanisms underlying visual semantic processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decodes fMRI signals into textual descriptions

Deep learning model trained without visual information

Reveals neural mechanisms in visual semantic processing

🔎 Similar Papers

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers