🤖 AI Summary
This study addresses the challenge of multilingual hate speech detection in deepfake audio. We introduce SynHate, the first synthetic-speech hate detection dataset covering 37 languages, and propose a novel four-dimensional fine-grained annotation framework—Real-normal, Real-hate, Fake-normal, Fake-hate—formally defining the multilingual hate speech detection task for synthetic speech. High-fidelity synthetic samples are generated using MuTox and ADIMA. We conduct a systematic evaluation of leading self-supervised speech models—including Whisper-small/medium, XLS-R, AST, and mHuBERT—on cross-lingual and cross-manipulation-domain generalization. Experiments reveal substantial performance disparities across languages and forgery domains, exposing critical generalization bottlenecks: while Whisper-small achieves top performance on most languages, all models suffer severe degradation under cross-dataset transfer. To foster reproducibility and further research, we publicly release the entire SynHate dataset and baseline code.
📝 Abstract
The rise of deepfake audio and hate speech, powered by advanced text-to-speech, threatens online safety. We present SynHate, the first multilingual dataset for detecting hate speech in synthetic audio, spanning 37 languages. SynHate uses a novel four-class scheme: Real-normal, Real-hate, Fake-normal, and Fake-hate. Built from MuTox and ADIMA datasets, it captures diverse hate speech patterns globally and in India. We evaluate five leading self-supervised models (Whisper-small/medium, XLS-R, AST, mHuBERT), finding notable performance differences by language, with Whisper-small performing best overall. Cross-dataset generalization remains a challenge. By releasing SynHate and baseline code, we aim to advance robust, culturally sensitive, and multilingual solutions against synthetic hate speech. The dataset is available at https://www.iab-rubric.org/resources.