🤖 AI Summary
This study investigates similarities and differences between large language models (LLMs) and humans in processing garden-path sentences—a classic psycholinguistic probe of incremental syntactic parsing. Method: Leveraging a psycholinguistically grounded question-answering paradigm, we systematically evaluate real-time syntactic processing in GPT-4, Claude-3, and other state-of-the-art LLMs, directly adapting established human parsing hypotheses (e.g., Late Closure, Minimal Attachment) into an LLM evaluation framework. We control for syntactic complexity and employ cross-modal validation via text rewriting and text-to-image generation. Contribution/Results: Multiple advanced LLMs exhibit strong correlation with human behavioral patterns across key metrics (r > 0.8) and maintain consistent performance across tasks. Although LLMs possess shallower syntactic representations, they robustly replicate human-like syntactic difficulty profiles—demonstrating interpretable, cognitively plausible behavior. This work establishes the first empirically testable, classical psycholinguistics–based evaluation framework for modeling linguistic cognition in LLMs.
📝 Abstract
Modern Large Language Models (LLMs) have shown human-like abilities in many language tasks, sparking interest in comparing LLMs' and humans' language processing. In this paper, we conduct a detailed comparison of the two on a sentence comprehension task using garden-path constructions, which are notoriously challenging for humans. Based on psycholinguistic research, we formulate hypotheses on why garden-path sentences are hard, and test these hypotheses on human participants and a large suite of LLMs using comprehension questions. Our findings reveal that both LLMs and humans struggle with specific syntactic complexities, with some models showing high correlation with human comprehension. To complement our findings, we test LLM comprehension of garden-path constructions with paraphrasing and text-to-image generation tasks, and find that the results mirror the sentence comprehension question results, further validating our findings on LLM understanding of these constructions.