More Aligned, Less Diverse? Analyzing the Grammar and Lexicon of Two Generations of LLMs

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This study investigates whether recent large language models (LLMs), while improving instruction alignment, concurrently sacrifice linguistic diversity. To this end, we introduce a novel evaluation paradigm that integrates ecological and information-theoretic diversity metrics within the formal framework of Head-Driven Phrase Structure Grammar (HPSG). We systematically compare syntactic structures and lexical type distributions between two generations of LLMs and human-authored English news texts. Our analysis reveals that newer, alignment-optimized models exhibit significantly reduced syntactic and lexical diversity compared to both their predecessors and contemporaneous human writing, which remains stable over time. These findings not only demonstrate a previously underappreciated side effect of alignment training—namely, linguistic simplification—but also establish a new methodological approach for assessing the expressive capacity of large language models.

📝 Abstract

This study contributes to a growing line of research in comparing LLM-generated texts with human-authored text, in this case, English news text. We focus in particular on the evaluation of syntactic properties through formal grammar frameworks. Our analysis compares two generations of LLMs in the context of two human-authored English news datasets from two different years. Employing the Head-Driven Phrase Structure Grammar (HPSG) formalism, we investigate the distributions of syntactic structures and lexical types of AI-generated texts and contrast them with the corresponding distributions in the human-authored New York Times (NYT) articles. We use diversity metrics from ecology and information theory to quantify variation in grammatical constructions and lexical types. We show that English news text has changed little in the given time frame, while newer LLMs display reduced syntactic and, especially, lexical diversity compared to older, non-instruction-tuned models. These findings point to future work in studying effects of instruction tuning, which, while enhancing coherence and adherence to prompts, may narrow the expressive range of model output.

Problem

Research questions and friction points this paper is trying to address.

large language models

syntactic diversity

lexical diversity

instruction tuning

text generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Head-Driven Phrase Structure Grammar

syntactic diversity

lexical diversity