Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance

📅 2024-06-25

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This study identifies a systematic “native-language design bias” in large language models (LLMs): when LLMs infer users to be non-native English speakers—particularly those from non-Western backgrounds—their response quality degrades significantly, with factual error rates increasing by 23% and helpfulness decreasing by 19%. Strikingly, anchoring prompts exacerbate this bias by 41%. Method: We construct the first multilingual background evaluation dataset comprising over 12,000 human-annotated samples. Our methodology integrates cross-group prompt comparison, controlled nativeness prompting, and multidimensional assessment (factuality, fluency, helpfulness). Contribution/Results: We provide the first empirical validation of implicit discrimination against non-native users in LLMs, attributing it to a cognitive anchoring effect triggered by native-language identity inference. This work establishes a critical benchmark and theoretical foundation for fairness evaluation and debiasing optimization of LLMs.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) excel at providing information acquired during pretraining on large-scale corpora and following instructions through user prompts. This study investigates whether the quality of LLM responses varies depending on the demographic profile of users. Considering English as the global lingua franca, along with the diversity of its dialects among speakers of different native languages, we explore whether non-native English speakers receive lower-quality or even factually incorrect responses from LLMs more frequently. Our results show that performance discrepancies occur when LLMs are prompted by native versus non-native English speakers and persist when comparing native speakers from Western countries with others. Additionally, we find a strong anchoring effect when the model recognizes or is made aware of the user's nativeness, which further degrades the response quality when interacting with non-native speakers. Our analysis is based on a newly collected dataset with over 12,000 unique annotations from 124 annotators, including information on their native language and English proficiency.

Problem

Research questions and friction points this paper is trying to address.

Investigating LLM performance disparities between native and non-native English speakers

Examining how user demographic profiles affect response quality in language models

Analyzing anchoring effects when models detect user's English nativeness status

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing performance discrepancies between native and non-native speakers

Investigating anchoring effects based on user nativeness recognition

Using annotated dataset with demographic and proficiency information

🔎 Similar Papers

Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations