Do Language Models Align with Brains? Prediction Scores Are Not Enough

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing studies often infer alignment between language models and the human brain based on predictive scores, yet the sufficiency of this approach remains questionable. This work proposes L-PACT, a multidimensional validation framework that systematically evaluates alignment across four dimensions: predictivity, relational structure, mechanistic disentanglement, and reliability, incorporating rigorously controlled conditions and auditable counterfactual explanations. Through neural prediction analyses, relational profile comparisons, mechanism-ablation recomputations, brain–brain ceiling normalization, and multiple sensitivity tests, none of the models examined passed all alignment thresholds across 146 comprehensive evaluations. All ostensibly positive findings were fully accounted for by control conditions, with no robust evidence supporting structural alignment. These results challenge prevailing alignment paradigms and establish more stringent criteria for model–brain comparisons.

📝 Abstract

Brain-language model comparisons often interpret neural prediction scores as evidence that model representations capture brain-relevant language computation. We asked whether language models align with brains, and whether prediction scores are enough to support that claim, using L-PACT, a source-audited framework that evaluates predictive, relational, mechanism-stripping, and reliability-bounded evidence. Across primary naturalistic language neural datasets and derived language-model representations, L-PACT compared real model features with nuisance baselines and severe controls, tested whether model-to-brain profiles reproduced brain-to-brain patterns, recomputed held-out scores after mechanism stripping, and normalized evidence against brain-brain ceilings. The locked analysis set contains 414 predictive-control rows, 2304 relational profile rows, 4320 mechanism-stripping rows, 420 brain-brain ceiling rows, and 146 integrated decision rows. Assay-sensitivity checks showed that brain-brain reliability, brain-as-model run-to-run relational profiles, independent low-level neural and WAV-derived acoustic-envelope gates, and a deterministic implanted-signal simulation can produce positive evidence when expected. Nevertheless, no real model row passed the predictive, relational, mechanism-stripping, or operational Turing-bounded reliability gates; all 146 integrated rows were control-explained. Less stringent single-criterion rules would have counted raw positive predictive, relational, stripping-delta, and ceiling-normalized effects, but L-PACT downgraded them because controls explained the apparent evidence. In the analyzed derived artifact set, the tested language-model representations do not satisfy L-PACT alignment gates; apparent positives are converted into an auditable control-explained taxonomy rather than treated as structural alignment.

Problem

Research questions and friction points this paper is trying to address.

language models

brain alignment

neural prediction

model evaluation

control baselines

Innovation

Methods, ideas, or system contributions that make the work stand out.

L-PACT

brain-language alignment

mechanism-stripping