🤖 AI Summary
This study addresses the scarcity of effective hate speech detection tools for low-resource Southeast Asian languages—specifically Indonesian, Tagalog, Thai, and Vietnamese—by introducing the first culturally grounded, functionally oriented evaluation benchmark. Building upon the HateCheck framework, the authors leverage large language models to generate test cases, which are then validated by native linguistic experts and evaluated using state-of-the-art multilingual pretrained models. The analysis systematically uncovers critical blind spots in current models, particularly their inability to detect implicit hate and ironic expressions. Experimental results reveal especially poor performance on Tagalog, where models struggle significantly with slang and contextually embedded hateful content. These findings provide essential diagnostic insights to guide future improvements in cross-lingual hate speech detection for underrepresented languages.
📝 Abstract
Hate speech detection relies heavily on linguistic resources, which are primarily available in high-resource languages such as English and Chinese, creating barriers for researchers and platforms developing tools for low-resource languages in Southeast Asia, where diverse socio-linguistic contexts complicate online hate moderation. To address this, we introduce SEAHateCheck, a pioneering dataset tailored to Indonesia, Thailand, the Philippines, and Vietnam, covering Indonesian, Tagalog, Thai, and Vietnamese. Building on HateCheck's functional testing framework and refining SGHateCheck's methods, SEAHateCheck provides culturally relevant test cases, augmented by large language models and validated by local experts for accuracy. Experiments with state-of-the-art and multilingual models revealed limitations in detecting hate speech in specific low-resource languages. In particular, Tagalog test cases showed the lowest model accuracy, likely due to linguistic complexity and limited training data. In contrast, slang-based functional tests proved the hardest, as models struggled with culturally nuanced expressions. The diagnostic insights of SEAHateCheck further exposed model weaknesses in implicit hate detection and models' struggles with counter-speech expression. As the first functional test suite for these Southeast Asian languages, this work equips researchers with a robust benchmark, advancing the development of practical, culturally attuned hate speech detection tools for inclusive online content moderation.