LLM-C3MOD: A Human-LLM Collaborative System for Cross-Cultural Hate Speech Moderation

๐Ÿ“… 2025-03-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Global content moderation suffers from linguistic resource imbalance: mainstream platforms rely on high-resource-language moderators, leaving low-resource languages underrepresented and prone to misclassifying cross-cultural hate speech due to cultural misinterpretation. This paper proposes a human-AI collaborative moderation framework comprising three stages: (1) RAG-based initial screening enhanced with culturally contextualized prompts; (2) LLM-driven identification of consensus gaps across multiple cultural perspectives; and (3) targeted human review triggered by such gaps. We introduce the first RAG-based cultural annotation mechanism tailored for cross-cultural moderation, enabling explicit injection and dynamic retrieval of cultural knowledge. Evaluated on a Korean-language dataset, our approach achieves 78% accuracyโ€”7 percentage points higher than GPT-4oโ€”while reducing human workload by 83.6%. Results demonstrate that non-native moderators, augmented by culturally enriched LLMs, can effectively contribute to global content governance.

Technology Category

Application Category

๐Ÿ“ Abstract
Content moderation is a global challenge, yet major tech platforms prioritize high-resource languages, leaving low-resource languages with scarce native moderators. Since effective moderation depends on understanding contextual cues, this imbalance increases the risk of improper moderation due to non-native moderators' limited cultural understanding. Through a user study, we identify that non-native moderators struggle with interpreting culturally-specific knowledge, sentiment, and internet culture in the hate speech moderation. To assist them, we present LLM-C3MOD, a human-LLM collaborative pipeline with three steps: (1) RAG-enhanced cultural context annotations; (2) initial LLM-based moderation; and (3) targeted human moderation for cases lacking LLM consensus. Evaluated on a Korean hate speech dataset with Indonesian and German participants, our system achieves 78% accuracy (surpassing GPT-4o's 71% baseline), while reducing human workload by 83.6%. Notably, human moderators excel at nuanced contents where LLMs struggle. Our findings suggest that non-native moderators, when properly supported by LLMs, can effectively contribute to cross-cultural hate speech moderation.
Problem

Research questions and friction points this paper is trying to address.

Addresses cross-cultural hate speech moderation challenges
Reduces reliance on non-native moderators' cultural understanding
Combines human-LLM collaboration for accurate moderation
Innovation

Methods, ideas, or system contributions that make the work stand out.

RAG-enhanced cultural context annotations
LLM-based initial moderation
Targeted human moderation for unclear cases
๐Ÿ”Ž Similar Papers
No similar papers found.