🤖 AI Summary
The neural representation of visual concepts in the human brain remains poorly understood, with existing studies constrained by small sample sizes, heavy manual curation, region-specific focus, and insufficient systematic validation. To address these limitations, we propose the first whole-brain, data-driven, fully automated framework: it applies unsupervised decomposition (e.g., ICA or PCA) to fMRI signals to extract interpretable neural response patterns; aligns these patterns with natural image stimuli via response matching; and integrates multimodal alignment (image–text) to assign semantic labels automatically and validate cross-subject reliability. Applied to a thousand-subject fMRI dataset, our method identifies thousands of high-reliability, fine-grained visual representations—spanning objects, attributes, and scenes—and reveals, for the first time, systematic semantic encoding beyond occipital cortex, extending into temporal, parietal, and prefrontal regions. This advances brain decoding in scalability, interpretability, and reproducibility.
📝 Abstract
Understanding how the human brain represents visual concepts, and in which brain regions these representations are encoded, remains a long-standing challenge. Decades of work have advanced our understanding of visual representations, yet brain signals remain large and complex, and the space of possible visual concepts is vast. As a result, most studies remain small-scale, rely on manual inspection, focus on specific regions and properties, and rarely include systematic validation. We present a large-scale, automated framework for discovering and explaining visual representations across the human cortex. Our method comprises two main stages. First, we discover candidate interpretable patterns in fMRI activity through unsupervised, data-driven decomposition methods. Next, we explain each pattern by identifying the set of natural images that most strongly elicit it and generating a natural-language description of their shared visual meaning. To scale this process, we introduce an automated pipeline that tests multiple candidate explanations, assigns quantitative reliability scores, and selects the most consistent description for each voxel pattern. Our framework reveals thousands of interpretable patterns spanning many distinct visual concepts, including fine-grained representations previously unreported.