🤖 AI Summary
This study addresses two key limitations in fMRI-based brain decoding: overreliance on visual reconstruction and insufficient neuroscientific interpretation of semantic processing. We propose the first end-to-end fMRI-to-text semantic decoding paradigm that bypasses intermediate visual reconstruction entirely. Methodologically, we integrate cross-modal semantic alignment training, deep learning–based fMRI signal modeling, and fine-grained neuroanatomical localization analysis. Key contributions include: (1) the first systematic demonstration of cooperative involvement of MT+, ventral visual cortex, and inferior parietal lobule in vision-to-semantic transformation; (2) empirical validation of separable neural representations for high-level semantic dimensions—including animacy and motion; and (3) state-of-the-art (SOTA) semantic decoding performance, with generated text accurately capturing scene-level semantics. Our framework establishes a high-fidelity, interpretable observational pathway for investigating cortical semantic encoding mechanisms.
📝 Abstract
Deciphering the neural mechanisms that transform sensory experiences into meaningful semantic representations is a fundamental challenge in cognitive neuroscience. While neuroimaging has mapped a distributed semantic network, the format and neural code of semantic content remain elusive, particularly for complex, naturalistic stimuli. Traditional brain decoding, focused on visual reconstruction, primarily captures low-level perceptual features, missing the deeper semantic essence guiding human cognition. Here, we introduce a paradigm shift by directly decoding fMRI signals into textual descriptions of viewed natural images. Our novel deep learning model, trained without visual input, achieves state-of-the-art semantic decoding performance, generating meaningful captions that capture the core semantic content of complex scenes. Neuroanatomical analysis reveals the critical role of higher-level visual regions, including MT+, ventral stream visual cortex, and inferior parietal cortex, in this semantic transformation. Category-specific decoding further demonstrates nuanced neural representations for semantic dimensions like animacy and motion. This text-based decoding approach provides a more direct and interpretable window into the brain's semantic encoding than visual reconstruction, offering a powerful new methodology for probing the neural basis of complex semantic processing, refining our understanding of the distributed semantic network, and potentially inspiring brain-inspired language models.