🤖 AI Summary
This work addresses the lack of a cross-modal, cross-task unified evaluation framework in brain–AI alignment research. To this end, we propose the “brain-like space”—a unified coordinate system that maps the intrinsic representational geometry of vision, language, and multimodal AI models onto human brain functional networks. Leveraging Transformer-based architectures, we employ spatial attention topology mapping, calibrated with large-scale neural response data, to quantitatively analyze 151 state-of-the-art models. Our analysis reveals, for the first time, that the brain-like space exhibits a continuous arc-shaped geometric structure, indicating a gradual progression in AI information organization from artificial to brain-like principles. We further identify significant correlations between brain-likeness and pretraining paradigms or positional encoding mechanisms—yet find no necessary improvement in downstream task performance. These findings establish a novel, interpretable paradigm for evaluating brain-inspired intelligence.
📝 Abstract
For decades, neuroscientists and computer scientists have pursued a shared ambition: to understand intelligence and build it. Modern artificial neural networks now rival humans in language, perception, and reasoning, yet it is still largely unknown whether these artificial systems organize information as the brain does. Existing brain-AI alignment studies have shown the striking correspondence between the two systems, but such comparisons remain bound to specific inputs and tasks, offering no common ground for comparing how AI models with different kinds of modalities-vision, language, or multimodal-are intrinsically organized. Here we introduce a groundbreaking concept of Brain-like Space: a unified geometric space in which every AI model can be precisely situated and compared by mapping its intrinsic spatial attention topological organization onto canonical human functional brain networks, regardless of input modality, task, or sensory domain. Our extensive analysis of 151 Transformer-based models spanning state-of-the-art large vision models, large language models, and large multimodal models uncovers a continuous arc-shaped geometry within this space, reflecting a gradual increase of brain-likeness; different models exhibit distinct distribution patterns within this geometry associated with different degrees of brain-likeness, shaped not merely by their modality but by whether the pretraining paradigm emphasizes global semantic abstraction and whether the positional encoding scheme facilitates deep fusion across different modalities. Moreover, the degree of brain-likeness for a model and its downstream task performance are not "identical twins". The Brain-like Space provides the first unified framework for situating, quantifying, and comparing intelligence across domains, revealing the deep organizational principles that bridge machines and the brain.