DIA-HARM: Dialectal Disparities in Harmful Content Detection Across 50 English Dialects

📅 2026-04-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limited robustness of existing harmful content detectors, which are predominantly developed for Standard American English and exhibit systemic biases against speakers of non-standard dialects across the globe. To tackle this issue, the authors propose DIA-HARM, a novel evaluation framework, and introduce D3—the first benchmark corpus spanning 50 English dialects with 195,000 samples—augmented via a linguistically grounded Multi-VALUE transformation method. Through systematic evaluation of 16 models, the work reveals significant performance degradation in multialectal settings: human-written dialectal content reduces F1 scores by 1.4–3.6%, with some models suffering over 33% drops on mixed-content inputs. Among all models tested, the multilingual mDeBERTa achieves the strongest performance (average F1: 97.2%), substantially outperforming monolingual and zero-shot large language models, thereby advancing equitable and inclusive content moderation technologies.
📝 Abstract
Harmful content detectors-particularly disinformation classifiers-are predominantly developed and evaluated on Standard American English (SAE), leaving their robustness to dialectal variation unexplored. We present DIA-HARM, the first benchmark for evaluating disinformation detection robustness across 50 English dialects spanning U.S., British, African, Caribbean, and Asia-Pacific varieties. Using Multi-VALUE's linguistically grounded transformations, we introduce D3 (Dialectal Disinformation Detection), a corpus of 195K samples derived from established disinformation benchmarks. Our evaluation of 16 detection models reveals systematic vulnerabilities: human-written dialectal content degrades detection by 1.4-3.6% F1, while AI-generated content remains stable. Fine-tuned transformers substantially outperform zero-shot LLMs (96.6% vs. 78.3% best-case F1), with some models exhibiting catastrophic failures exceeding 33% degradation on mixed content. Cross-dialectal transfer analysis across 2,450 dialect pairs shows that multilingual models (mDeBERTa: 97.2% average F1) generalize effectively, while monolingual models like RoBERTa and XLM-RoBERTa fail on dialectal inputs. These findings demonstrate that current disinformation detectors may systematically disadvantage hundreds of millions of non-SAE speakers worldwide. We release the DIA-HARM framework, D3 corpus, and evaluation tools: https://github.com/jsl5710/dia-harm
Problem

Research questions and friction points this paper is trying to address.

harmful content detection
disinformation classification
dialectal variation
English dialects
model robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

dialectal robustness
disinformation detection
DIA-HARM benchmark
cross-dialectal transfer
non-Standard English
🔎 Similar Papers
No similar papers found.