Do Models See in Line with Human Vision? Probing the Correspondence Between LVLM Representations and EEG Signals

๐Ÿ“… 2026-03-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study investigates whether the internal representations of large vision-language models (LVLMs) align with human visual cognition. Leveraging image-evoked electroencephalography (EEG) signals and employing ridge regression alongside representational similarity analysis, the authors systematically evaluate the neural alignment between the intermediate layers of 32 open-source LVLMs and human brain responses within the 100โ€“300 ms post-stimulus time window. The work establishes, for the first time, a correspondence between LVLM intermediate representations and EEG signals in this critical temporal window, proposing โ€œneural alignmentโ€ as a novel benchmark for evaluating LVLMs. Results indicate that multimodal architecture design contributes more significantly to brain alignment than model parameter count, and that high-performing LVLMs exhibit stronger human-like visual representational capabilities.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Vision Language Models (LVLMs) exhibit strong visual understanding and reasoning abilities. However, whether their internal representations reflect human visual cognition is still under-explored. In this paper, we address this by quantifying LVLM-brain alignment using image-evoked Electroencephalogram (EEG) signals, analyzing the effects of model architecture, scale, and image type. Specifically, by using ridge regression and representational similarity analysis, we compare visual representations from 32 open-source LVLMs with corresponding EEG responses. We observe a structured LVLM-brain correspondence: First, intermediate layers (8-16) show peak alignment with EEG activity in the 100-300 ms window, consistent with hierarchical human visual processing. Secondly, multimodal architectural design contributes 3.4 more to brain alignment than parameter scaling, and models with stronger downstream visual performance exhibit higher EEG similarity. Thirdly, spatiotemporal patterns further align with known cortical visual pathways. These results demonstrate that LVLMs learn human-aligned visual representations and establish neural alignment as a biologically grounded benchmark for evaluating and improving LVLMs. In addition, those results could provide insights that may inform the development of neuro-inspired applications.
Problem

Research questions and friction points this paper is trying to address.

LVLM-brain alignment
human visual cognition
EEG signals
visual representations
neural correspondence
Innovation

Methods, ideas, or system contributions that make the work stand out.

LVLM-brain alignment
EEG signals
representational similarity analysis
multimodal architecture
neural benchmarking
๐Ÿ”Ž Similar Papers
No similar papers found.
Xin Xiao
Xin Xiao
ByteDance Research
VLAVLM
Y
Yang Lei
Chongqing University
Haoyang Zeng
Haoyang Zeng
Xaira Theurapeutics
Machine LearningProtein DesignPeptide VaccineGene Regulation
X
Xiao Sun
Chongqing University
X
Xinyi Jiang
UNSW Sydney
Y
Yu Tian
Tsinghua University
Hao Wu
Hao Wu
Asa and Patricia Springer Professor, Boston Children's Hospital and Harvard Medical School
Structural biologyinnate immunityadaptive immunitytherapeutics
K
Kaiwen Wei
Chongqing University
J
Jiang Zhong
Chongqing University