๐ค AI Summary
This study identifies a systematic degradation in the reasoning capabilities of large language models (LLMs) when processing African American English (AAE), particularly in social science and humanities tasksโevidenced by a 12.7% average accuracy drop, a 34% reduction in reasoning chain length, and diminished explanation completeness and quality. Method: We introduce the first standardized evaluation framework integrating LLM-based dialect conversion with linguistic analysis, combining contrastive prompting, structured reasoning chain assessment, and domain-stratified evaluation protocols. Contribution/Results: Our empirical analysis reveals, for the first time, a dialect-dependent attenuation effect: reasoning chain complexity and explanatory quality degrade significantly under AAE input. This highlights a critical fairness gap in current LLMs across multilingual and multidialectal contexts. The work establishes a reproducible methodological foundation and provides key empirical evidence to guide the development of linguistically inclusive reasoning models.
๐ Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities in reasoning tasks, leading to their widespread deployment. However, recent studies have highlighted concerning biases in these models, particularly in their handling of dialectal variations like African American English (AAE). In this work, we systematically investigate dialectal disparities in LLM reasoning tasks. We develop an experimental framework comparing LLM performance given Standard American English (SAE) and AAE prompts, combining LLM-based dialect conversion with established linguistic analyses. We find that LLMs consistently produce less accurate responses and simpler reasoning chains and explanations for AAE inputs compared to equivalent SAE questions, with disparities most pronounced in social science and humanities domains. These findings highlight systematic differences in how LLMs process and reason about different language varieties, raising important questions about the development and deployment of these systems in our multilingual and multidialectal world. Our code repository is publicly available at https://github.com/Runtaozhou/dialect_bias_eval.