Do Models See in Line with Human Vision? Probing the Correspondence Between LVLM Representations and EEG Signals

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

This study investigates whether the internal representations of large vision-language models (LVLMs) align with human visual cognition. Leveraging image-evoked electroencephalography (EEG) signals and employing ridge regression alongside representational similarity analysis, the authors systematically evaluate the neural alignment between the intermediate layers of 32 open-source LVLMs and human brain responses within the 100–300 ms post-stimulus time window. The work establishes, for the first time, a correspondence between LVLM intermediate representations and EEG signals in this critical temporal window, proposing “neural alignment” as a novel benchmark for evaluating LVLMs. Results indicate that multimodal architecture design contributes more significantly to brain alignment than model parameter count, and that high-performing LVLMs exhibit stronger human-like visual representational capabilities.

Technology Category

Application Category

📝 Abstract

Large Vision Language Models (LVLMs) exhibit strong visual understanding and reasoning abilities. However, whether their internal representations reflect human visual cognition is still under-explored. In this paper, we address this by quantifying LVLM-brain alignment using image-evoked Electroencephalogram (EEG) signals, analyzing the effects of model architecture, scale, and image type. Specifically, by using ridge regression and representational similarity analysis, we compare visual representations from 32 open-source LVLMs with corresponding EEG responses. We observe a structured LVLM-brain correspondence: First, intermediate layers (8-16) show peak alignment with EEG activity in the 100-300 ms window, consistent with hierarchical human visual processing. Secondly, multimodal architectural design contributes 3.4 more to brain alignment than parameter scaling, and models with stronger downstream visual performance exhibit higher EEG similarity. Thirdly, spatiotemporal patterns further align with known cortical visual pathways. These results demonstrate that LVLMs learn human-aligned visual representations and establish neural alignment as a biologically grounded benchmark for evaluating and improving LVLMs. In addition, those results could provide insights that may inform the development of neuro-inspired applications.

Problem

Research questions and friction points this paper is trying to address.

LVLM-brain alignment

human visual cognition

EEG signals

visual representations

neural correspondence

Innovation

Methods, ideas, or system contributions that make the work stand out.

LVLM-brain alignment

EEG signals

representational similarity analysis