When the LM misunderstood the human chuckled: Analyzing garden path effects in humans and language models

📅 2025-02-13

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study investigates similarities and differences between large language models (LLMs) and humans in processing garden-path sentences—a classic psycholinguistic probe of incremental syntactic parsing. Method: Leveraging a psycholinguistically grounded question-answering paradigm, we systematically evaluate real-time syntactic processing in GPT-4, Claude-3, and other state-of-the-art LLMs, directly adapting established human parsing hypotheses (e.g., Late Closure, Minimal Attachment) into an LLM evaluation framework. We control for syntactic complexity and employ cross-modal validation via text rewriting and text-to-image generation. Contribution/Results: Multiple advanced LLMs exhibit strong correlation with human behavioral patterns across key metrics (r > 0.8) and maintain consistent performance across tasks. Although LLMs possess shallower syntactic representations, they robustly replicate human-like syntactic difficulty profiles—demonstrating interpretable, cognitively plausible behavior. This work establishes the first empirically testable, classical psycholinguistics–based evaluation framework for modeling linguistic cognition in LLMs.

Technology Category

Application Category

📝 Abstract

Modern Large Language Models (LLMs) have shown human-like abilities in many language tasks, sparking interest in comparing LLMs' and humans' language processing. In this paper, we conduct a detailed comparison of the two on a sentence comprehension task using garden-path constructions, which are notoriously challenging for humans. Based on psycholinguistic research, we formulate hypotheses on why garden-path sentences are hard, and test these hypotheses on human participants and a large suite of LLMs using comprehension questions. Our findings reveal that both LLMs and humans struggle with specific syntactic complexities, with some models showing high correlation with human comprehension. To complement our findings, we test LLM comprehension of garden-path constructions with paraphrasing and text-to-image generation tasks, and find that the results mirror the sentence comprehension question results, further validating our findings on LLM understanding of these constructions.

Problem

Research questions and friction points this paper is trying to address.

Compare human and LLM language processing

Analyze garden-path sentence comprehension challenges

Test LLMs with paraphrasing and image generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compare LLMs and humans

Use garden-path constructions

Test with paraphrasing and image tasks

🔎 Similar Papers

Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models