Standard Language Ideology in AI-Generated Language

📅 2024-06-13
🏛️ arXiv.org
📈 Citations: 7
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies an implicit “Standard AI-Generated Language Ideology” embedded in large language models (LLMs), wherein linguistic outputs default to Standard American English (SAE), systematically marginalizing racially minoritized English varieties and reinforcing linguistic inequity. Method: Integrating critical linguistics, digital humanities, and AI ethics, the work employs conceptual modeling and systematic literature analysis—not empirical training—to develop an open-ended, community-centered question taxonomy for linguistically marginalized groups. Contribution/Results: It introduces the novel theoretical construct of the Standard AI-Generated Language Ideology to explicate power dynamics in LLM language generation; defines three key dialectical dimensions—standardization vs. variation, monolingualism vs. plurilingualism, and homogenization vs. contextualization—for assessing systemic inclusivity; and advances a decolonial, multilingual framework for generative AI, offering both a critical diagnostic lens and actionable ethical guardrails.

Technology Category

Application Category

📝 Abstract
In this position paper, we explore standard language ideology in language generated by large language models (LLMs). First, we outline how standard language ideology is reflected and reinforced in LLMs. We then present a taxonomy of open problems regarding standard language ideology in AI-generated language with implications for minoritized language communities. We introduce the concept of standard AI-generated language ideology, the process by which AI-generated language regards Standard American English (SAE) as a linguistic default and reinforces a linguistic bias that SAE is the most"appropriate"language. Finally, we discuss tensions that remain, including reflecting on what desirable system behavior looks like, as well as advantages and drawbacks of generative AI tools imitating--or often not--different English language varieties. Throughout, we discuss standard language ideology as a manifestation of existing global power structures in and through AI-generated language before ending with questions to move towards alternative, more emancipatory digital futures.
Problem

Research questions and friction points this paper is trying to address.

LLMs reinforce standard language ideology in AI-generated content
Standard AI-generated language ideology favors Standard American English
Generative AI tools struggle with diverse English language varieties
Innovation

Methods, ideas, or system contributions that make the work stand out.

Faceted taxonomy for standard language ideology
Concept of standard AI-generated language ideology
Recommendations for emancipatory language outcomes
🔎 Similar Papers
No similar papers found.
G
G. Smith
UC Berkeley
Eve Fleisig
Eve Fleisig
UC Berkeley
Natural Language ProcessingDeep LearningEthical AIFairness in ML
M
Madeline Bossi
UC Berkeley
I
Ishita Rustagi
UC Berkeley
X
Xavier Yin
UC Berkeley