🤖 AI Summary
This work addresses spurious false-positive bias in text toxicity detection—where references to specific demographic groups trigger erroneous toxicity classifications—and presents the first empirical investigation into whether speech modality mitigates this bias. Leveraging the multilingual MuTox dataset, we introduce fine-grained manual demographic annotations and propose a speech-text joint modeling framework. Using fairness metrics—including demographic parity difference (DPD) and equalized odds (EO)—we systematically compare bias in speech versus text classifiers. Results show that speech input reduces group-associated false-positive rates by 23% on average, with especially pronounced gains on ambiguous and contentious samples; moreover, optimizing classifier architecture yields greater bias reduction than improving ASR transcription quality. Our contributions include: (1) empirically demonstrating speech’s bias-mitigation potential; (2) releasing fully annotated data and a multimodal toxicity construction guideline; and (3) establishing a novel paradigm for fair, robust cross-modal content safety detection.
📝 Abstract
Text toxicity detection systems exhibit significant biases, producing disproportionate rates of false positives on samples mentioning demographic groups. But what about toxicity detection in speech? To investigate the extent to which text-based biases are mitigated by speech-based systems, we produce a set of high-quality group annotations for the multilingual MuTox dataset, and then leverage these annotations to systematically compare speech- and text-based toxicity classifiers. Our findings indicate that access to speech data during inference supports reduced bias against group mentions, particularly for ambiguous and disagreement-inducing samples. Our results also suggest that improving classifiers, rather than transcription pipelines, is more helpful for reducing group bias. We publicly release our annotations and provide recommendations for future toxicity dataset construction.