🤖 AI Summary
This study addresses why intermediate layers of language and speech models better predict neural responses to natural language than output layers. For the first time, intrinsic dimensionality is employed as a quantitative measure of semantic abstraction, enabling a systematic analysis of the relationship among intrinsic dimensionality, semantic richness, and brain response prediction accuracy across model layers, using both fMRI and ECoG data. The findings reveal that intermediate layers exhibit higher intrinsic dimensionality and greater semantic richness, which significantly enhance their ability to predict brain activity. Furthermore, the synergy between pretraining and brain-informed fine-tuning strengthens the alignment among these three factors, highlighting semantic abstraction as a core mechanism underlying the correspondence between artificial model representations and human neural representations.
📝 Abstract
Research has repeatedly demonstrated that intermediate hidden states extracted from large language models and speech audio models predict measured brain response to natural language stimuli. Yet, very little is known about the representation properties that enable this high prediction performance. Why is it the intermediate layers, and not the output layers, that are most effective for this unique and highly general transfer task? We give evidence that the correspondence between speech and language models and the brain derives from shared meaning abstraction and not their next-word prediction properties. In particular, models construct higher-order linguistic features in their middle layers, cued by a peak in the layerwise intrinsic dimension, a measure of feature complexity. We show that a layer's intrinsic dimension strongly predicts how well it explains fMRI and ECoG signals; that the relation between intrinsic dimension and brain predictivity arises over model pre-training; and finetuning models to better predict the brain causally increases both representations'intrinsic dimension and their semantic content. Results suggest that semantic richness, high intrinsic dimension, and brain predictivity mirror each other, and that the key driver of model-brain similarity is rich meaning abstraction of the inputs, where language modeling is a task sufficiently complex (but perhaps not the only) to require it.