Finding A Voice: Evaluating African American Dialect Generation for Chatbot Technology

📅 2025-01-07

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study investigates the capacity of mainstream large language models (LLMs)—including Llama, GPT, and Claude—to generate African American Vernacular English (AAVE) and examines how such generation affects user trust and role appropriateness in healthcare and education contexts. Method: Leveraging prompt engineering to systematically modulate AAVE intensity, we conduct multi-dimensional subjective evaluations—assessing credibility, professionalism, naturalness, and more—via expert annotation and crowdsourced user studies. Contribution/Results: Contrary to prevailing assumptions, we find that AAVE-speaking users consistently prefer standard American English (SAE)-generated responses; improving AAVE generation quality does not enhance user experience, and higher AAVE intensity correlates with significantly lower subjective ratings across all dimensions. This constitutes the first empirical study to systematically refute the necessity of dialectal adaptation for inclusive AI, demonstrating that linguistic surface alignment alone is insufficient—and potentially detrimental—without grounding in authentic user preferences.

Technology Category

Application Category

📝 Abstract

As chatbots become increasingly integrated into everyday tasks, designing systems that accommodate diverse user populations is crucial for fostering trust, engagement, and inclusivity. This study investigates the ability of contemporary Large Language Models (LLMs) to generate African American Vernacular English (AAVE) and evaluates the impact of AAVE usage on user experiences in chatbot applications. We analyze the performance of three LLM families (Llama, GPT, and Claude) in producing AAVE-like utterances at varying dialect intensities and assess user preferences across multiple domains, including healthcare and education. Despite LLMs' proficiency in generating AAVE-like language, findings indicate that AAVE-speaking users prefer Standard American English (SAE) chatbots, with higher levels of AAVE correlating with lower ratings for a variety of characteristics, including chatbot trustworthiness and role appropriateness. These results highlight the complexities of creating inclusive AI systems and underscore the need for further exploration of diversity to enhance human-computer interactions.

Problem

Research questions and friction points this paper is trying to address.

Chatbot Technology

African American Vernacular English (AAVE)

User Engagement

Innovation

Methods, ideas, or system contributions that make the work stand out.

African American Vernacular English (AAVE)

Chatbot Technology

Language Diversity in Human-Robot Interaction

🔎 Similar Papers

AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark