Leveraging Human Production-Interpretation Asymmetries to Test LLM Cognitive Plausibility

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) exhibit human-like linguistic processing mechanisms, specifically examining the “production–comprehension” cognitive asymmetry observed in humans during implicit causal verb tasks. We formalize this asymmetry as a novel benchmark for assessing LLMs’ cognitive interpretability. Our methodology integrates controlled linguistic task design, instruction-tuning behavioral analysis, and a dual-dimensional (quantitative and qualitative) evaluation framework. Results demonstrate that certain instruction-tuned LLMs robustly replicate human-level production–comprehension disparities; this capability scales with model parameter count and is significantly modulated by metalinguistic prompt type. To our knowledge, this work introduces the first cognitive interpretability benchmark grounded in established theories of human sentence processing. It further uncovers a synergistic interaction between model scale and prompt engineering in eliciting human-like linguistic behavior.

Technology Category

Application Category

📝 Abstract
Whether large language models (LLMs) process language similarly to humans has been the subject of much theoretical and practical debate. We examine this question through the lens of the production-interpretation distinction found in human sentence processing and evaluate the extent to which instruction-tuned LLMs replicate this distinction. Using an empirically documented asymmetry between production and interpretation in humans for implicit causality verbs as a testbed, we find that some LLMs do quantitatively and qualitatively reflect human-like asymmetries between production and interpretation. We demonstrate that whether this behavior holds depends upon both model size - with larger models more likely to reflect human-like patterns and the choice of meta-linguistic prompts used to elicit the behavior.
Problem

Research questions and friction points this paper is trying to address.

Test LLM cognitive plausibility via human production-interpretation asymmetries
Evaluate if LLMs replicate human sentence processing distinctions
Assess model size and prompts on human-like asymmetry reflection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverage human production-interpretation asymmetries
Test LLMs with implicit causality verbs
Evaluate model size and meta-linguistic prompts