Finding A Voice: Evaluating African American Dialect Generation for Chatbot Technology

📅 2025-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the capacity of mainstream large language models (LLMs)—including Llama, GPT, and Claude—to generate African American Vernacular English (AAVE) and examines how such generation affects user trust and role appropriateness in healthcare and education contexts. Method: Leveraging prompt engineering to systematically modulate AAVE intensity, we conduct multi-dimensional subjective evaluations—assessing credibility, professionalism, naturalness, and more—via expert annotation and crowdsourced user studies. Contribution/Results: Contrary to prevailing assumptions, we find that AAVE-speaking users consistently prefer standard American English (SAE)-generated responses; improving AAVE generation quality does not enhance user experience, and higher AAVE intensity correlates with significantly lower subjective ratings across all dimensions. This constitutes the first empirical study to systematically refute the necessity of dialectal adaptation for inclusive AI, demonstrating that linguistic surface alignment alone is insufficient—and potentially detrimental—without grounding in authentic user preferences.

Technology Category

Application Category

📝 Abstract
As chatbots become increasingly integrated into everyday tasks, designing systems that accommodate diverse user populations is crucial for fostering trust, engagement, and inclusivity. This study investigates the ability of contemporary Large Language Models (LLMs) to generate African American Vernacular English (AAVE) and evaluates the impact of AAVE usage on user experiences in chatbot applications. We analyze the performance of three LLM families (Llama, GPT, and Claude) in producing AAVE-like utterances at varying dialect intensities and assess user preferences across multiple domains, including healthcare and education. Despite LLMs' proficiency in generating AAVE-like language, findings indicate that AAVE-speaking users prefer Standard American English (SAE) chatbots, with higher levels of AAVE correlating with lower ratings for a variety of characteristics, including chatbot trustworthiness and role appropriateness. These results highlight the complexities of creating inclusive AI systems and underscore the need for further exploration of diversity to enhance human-computer interactions.
Problem

Research questions and friction points this paper is trying to address.

Chatbot Technology
African American Vernacular English (AAVE)
User Engagement
Innovation

Methods, ideas, or system contributions that make the work stand out.

African American Vernacular English (AAVE)
Chatbot Technology
Language Diversity in Human-Robot Interaction
🔎 Similar Papers
No similar papers found.
Sarah E. Finch
Sarah E. Finch
School of Nursing, Emory University, Atlanta, GA, USA
Ellie S. Paek
Ellie S. Paek
Research Scientist, Emory University
S
Sejung Kwon
School of Nursing, Emory University, Atlanta, GA, USA
I
Ikseon Choi
School of Nursing, Emory University, Atlanta, GA, USA
J
Jessica Wells
School of Nursing, Emory University, Atlanta, GA, USA
R
Rasheeta D. Chandler
School of Nursing, Emory University, Atlanta, GA, USA
Jinho D. Choi
Jinho D. Choi
Associate Professor, Emory University
Natural Language ProcessingComputational LinguisticsConversational AI