The Shrinking Landscape of Linguistic Diversity in the Age of Large Language Models

📅 2025-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper reveals that large language models (LLMs) as writing assistants induce systemic linguistic homogenization: they preserve semantic content while significantly reducing individual stylistic diversity, selectively amplifying dominant stylistic features and societal biases, and suppressing marginalized linguistic expressions. Employing a multimethod empirical approach—including controlled experiments, natural-text observation, quantitative stylistic analysis, bias classifier evaluation, and robustness testing across models, prompts, and scenarios—the study is the first to demonstrate the strong generalizability of this phenomenon. Key contributions are: (1) establishing that LLM-driven linguistic diversity erosion poses profound risks to fairness (e.g., misjudging cultural adaptability in hiring), clinical diagnostics (loss of individuating language cues), and cultural preservation; and (2) providing the first reproducible, multidimensionally validated framework for assessing the sociolinguistic impact of AI-mediated language intervention.

Technology Category

Application Category

📝 Abstract
Language is far more than a communication tool. A wealth of information - including but not limited to the identities, psychological states, and social contexts of its users - can be gleaned through linguistic markers, and such insights are routinely leveraged across diverse fields ranging from product development and marketing to healthcare. In four studies utilizing experimental and observational methods, we demonstrate that the widespread adoption of large language models (LLMs) as writing assistants is linked to notable declines in linguistic diversity and may interfere with the societal and psychological insights language provides. We show that while the core content of texts is retained when LLMs polish and rewrite texts, not only do they homogenize writing styles, but they also alter stylistic elements in a way that selectively amplifies certain dominant characteristics or biases while suppressing others - emphasizing conformity over individuality. By varying LLMs, prompts, classifiers, and contexts, we show that these trends are robust and consistent. Our findings highlight a wide array of risks associated with linguistic homogenization, including compromised diagnostic processes and personalization efforts, the exacerbation of existing divides and barriers to equity in settings like personnel selection where language plays a critical role in assessing candidates' qualifications, communication skills, and cultural fit, and the undermining of efforts for cultural preservation.
Problem

Research questions and friction points this paper is trying to address.

Decline in linguistic diversity
Homogenization of writing styles
Impact on societal and psychological insights
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models homogenize writing styles
LLMs amplify dominant biases and characteristics
Linguistic diversity declines with LLM adoption
Zhivar Sourati
Zhivar Sourati
Graduate Research Assistant, University of Southern California
Natural Language ProcessingCognitive PsychologyReasoningSocial Network Analysis
Farzan Karimi-Malekabadi
Farzan Karimi-Malekabadi
University of Southern California
MoralityCultureLarge Language Models
M
Meltem Ozcan
Department of Psychology, University of Southern California
C
Colin McDaniel
Department of Psychology, University of Southern California
A
Alireza Ziabari
Department of Computer Science, University of Southern California
Jackson Trager
Jackson Trager
Phd Candidate, University of Southern California
Moral PsychologyCultural ConflictTech and SocietyPolarization & HateAI Ethics & Policy
A
Ala Tak
Department of Computer Science, University of Southern California
M
Meng Chen
Department of Psychology, University of Southern California
Fred Morstatter
Fred Morstatter
University of Southern California, Information Sciences Institute
Social Media MiningData ScienceData MiningMachine Learning
M
Morteza Dehghani
Department of Computer Science, University of Southern California