Language Models Largely Exhibit Human-like Constituent Ordering Preferences

📅 2025-02-08

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) exhibit human-like weight-driven preferences in syntactic constituent reordering—a hallmark of human syntactic processing. Method: We design a controlled, zero-shot preference evaluation framework targeting four linguistic phenomena—noun phrase movement, particle shift, dative alternation, and multiple prepositional phrase ordering—and test it across multiple generations of LLMs (GPT, Llama, Claude). Contribution/Results: Except for particle shift, all models significantly align with human experimental judgments across the other three phenomena (p < 0.01), providing the first systematic evidence that LLMs recapitulate human word-order intuitions across diverse constituent-movement constructions. This challenges the prevailing critique that LLMs rely solely on surface-level statistical correlations without cognitively plausible syntactic representations, and offers empirical support for the implicit acquisition of weight-sensitive syntactic knowledge in LLMs.

Technology Category

Application Category

📝 Abstract

Though English sentences are typically inflexible vis-`a-vis word order, constituents often show far more variability in ordering. One prominent theory presents the notion that constituent ordering is directly correlated with constituent weight: a measure of the constituent's length or complexity. Such theories are interesting in the context of natural language processing (NLP), because while recent advances in NLP have led to significant gains in the performance of large language models (LLMs), much remains unclear about how these models process language, and how this compares to human language processing. In particular, the question remains whether LLMs display the same patterns with constituent movement, and may provide insights into existing theories on when and how the shift occurs in human language. We compare a variety of LLMs with diverse properties to evaluate broad LLM performance on four types of constituent movement: heavy NP shift, particle movement, dative alternation, and multiple PPs. Despite performing unexpectedly around particle movement, LLMs generally align with human preferences around constituent ordering.

Problem

Research questions and friction points this paper is trying to address.

LLMs' constituent ordering preferences

Comparison with human language processing

Performance on four movement types

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compare LLMs' constituent ordering

Evaluate heavy NP shift

Analyze particle movement performance

🔎 Similar Papers

No similar papers found.