Analyzing Dialectical Biases in LLMs for Knowledge and Reasoning Benchmarks

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work reveals a significant performance degradation—up to 20% accuracy loss—in mainstream large language models (LLMs) on multiple-choice question answering when inputs are in non-standard English dialects, particularly African American Vernacular English (AAVE). To systematically investigate this issue, the authors construct a dialectal multiple-choice benchmark via controlled syntactic transformation rules that generate non-standard variants while preserving semantics. Using controlled variable analysis, they identify three high-impact grammatical structures as primary contributors to performance decline: existential “it,” zero copula, and the second-person plural pronoun “y’all.” Moving beyond holistic dialect adaptation, the paper proposes a novel “high-impact grammatical unit–targeted mitigation” paradigm. This approach enables interpretable, fine-grained bias analysis and actionable intervention strategies, advancing research on LLM fairness and linguistic robustness.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are ubiquitous in modern day natural language processing. However, previous work has shown degraded LLM performance for under-represented English dialects. We analyze the effects of typifying "standard" American English language questions as non-"standard" dialectal variants on multiple choice question answering tasks and find up to a 20% reduction in accuracy. Additionally, we investigate the grammatical basis of under-performance in non-"standard" English questions. We find that individual grammatical rules have varied effects on performance, but some are more consequential than others: three specific grammar rules (existential "it", zero copula, and y'all) can explain the majority of performance degradation observed in multiple dialects. We call for future work to investigate bias mitigation methods focused on individual, high-impact grammatical structures.
Problem

Research questions and friction points this paper is trying to address.

Analyzing LLM performance degradation in non-standard English dialects
Identifying key grammatical structures causing accuracy reduction
Proposing bias mitigation focused on high-impact grammatical features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing dialect bias in LLM performance degradation
Identifying three key grammar rules causing accuracy drop
Proposing targeted bias mitigation for specific grammatical structures
🔎 Similar Papers
No similar papers found.