Large Language Models Are Human-Like Internally

📅 2025-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior evaluations of large language models (LLMs) as cognitive models often rely solely on top-layer outputs, neglecting potential hierarchical correspondences between LLM internal representations and human neurocognitive language processing. Method: We employ mechanistic interpretability to systematically align LLM hidden-layer activations with multimodal human language processing data—including self-paced reading, eye-tracking, MAZE task performance, and EEG-derived N400 signals—under controlled linguistic stimuli. Contribution/Results: We identify a temporally structured layer-wise correspondence: early LLM layers align significantly with rapid oculomotor behaviors (e.g., saccades), whereas late layers better match slower, semantics-integration–related neural responses (e.g., N400). This reveals an intrinsic, rhythm-congruent functional hierarchy in LLMs mirroring human language processing—a finding that corrects the bias of top-layer–only cognitive evaluation. Moreover, next-token probabilities from deeper LLM layers outperform those from smaller models across multiple behavioral and neural metrics, substantially enhancing LLMs’ interpretability and cross-disciplinary utility as computational cognitive models.

Technology Category

Application Category

📝 Abstract
Recent cognitive modeling studies have reported that larger language models (LMs) exhibit a poorer fit to human reading behavior, leading to claims of their cognitive implausibility. In this paper, we revisit this argument through the lens of mechanistic interpretability and argue that prior conclusions were skewed by an exclusive focus on the final layers of LMs. Our analysis reveals that next-word probabilities derived from internal layers of larger LMs align with human sentence processing data as well as, or better than, those from smaller LMs. This alignment holds consistently across behavioral (self-paced reading times, gaze durations, MAZE task processing times) and neurophysiological (N400 brain potentials) measures, challenging earlier mixed results and suggesting that the cognitive plausibility of larger LMs has been underestimated. Furthermore, we first identify an intriguing relationship between LM layers and human measures: earlier layers correspond more closely with fast gaze durations, while later layers better align with relatively slower signals such as N400 potentials and MAZE processing times. Our work opens new avenues for interdisciplinary research at the intersection of mechanistic interpretability and cognitive modeling.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Reading Comprehension
Cognitive Simulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Human Reading Behavior
Neural Correlation
🔎 Similar Papers
No similar papers found.