DIA-HARM: Dialectal Disparities in Harmful Content Detection Across 50 English Dialects

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This study addresses the limited robustness of existing harmful content detectors, which are predominantly developed for Standard American English and exhibit systemic biases against speakers of non-standard dialects across the globe. To tackle this issue, the authors propose DIA-HARM, a novel evaluation framework, and introduce D3—the first benchmark corpus spanning 50 English dialects with 195,000 samples—augmented via a linguistically grounded Multi-VALUE transformation method. Through systematic evaluation of 16 models, the work reveals significant performance degradation in multialectal settings: human-written dialectal content reduces F1 scores by 1.4–3.6%, with some models suffering over 33% drops on mixed-content inputs. Among all models tested, the multilingual mDeBERTa achieves the strongest performance (average F1: 97.2%), substantially outperforming monolingual and zero-shot large language models, thereby advancing equitable and inclusive content moderation technologies.

Technology Category

Application Category

📝 Abstract

Harmful content detectors-particularly disinformation classifiers-are predominantly developed and evaluated on Standard American English (SAE), leaving their robustness to dialectal variation unexplored. We present DIA-HARM, the first benchmark for evaluating disinformation detection robustness across 50 English dialects spanning U.S., British, African, Caribbean, and Asia-Pacific varieties. Using Multi-VALUE's linguistically grounded transformations, we introduce D3 (Dialectal Disinformation Detection), a corpus of 195K samples derived from established disinformation benchmarks. Our evaluation of 16 detection models reveals systematic vulnerabilities: human-written dialectal content degrades detection by 1.4-3.6% F1, while AI-generated content remains stable. Fine-tuned transformers substantially outperform zero-shot LLMs (96.6% vs. 78.3% best-case F1), with some models exhibiting catastrophic failures exceeding 33% degradation on mixed content. Cross-dialectal transfer analysis across 2,450 dialect pairs shows that multilingual models (mDeBERTa: 97.2% average F1) generalize effectively, while monolingual models like RoBERTa and XLM-RoBERTa fail on dialectal inputs. These findings demonstrate that current disinformation detectors may systematically disadvantage hundreds of millions of non-SAE speakers worldwide. We release the DIA-HARM framework, D3 corpus, and evaluation tools: https://github.com/jsl5710/dia-harm

Problem

Research questions and friction points this paper is trying to address.

harmful content detection

disinformation classification

dialectal variation

English dialects

model robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

dialectal robustness

disinformation detection

DIA-HARM benchmark