Improving Implicit Hate Speech Detection via a Community-Driven Multi-Agent Framework

📅 2026-01-14

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenge of detecting implicit hate speech, which lacks explicit keywords and is deeply embedded in sociocultural contexts, rendering existing detection methods ineffective. The authors propose a novel multi-agent system comprising a central mediator agent and dynamically generated community agents, introducing for the first time a community-driven negotiation mechanism that explicitly integrates sociocultural background knowledge to enable identity-aware hate speech detection. Leveraging large language model prompt engineering, external knowledge integration, and a fairness-oriented evaluation framework centered on balanced accuracy, the proposed approach significantly outperforms state-of-the-art prompting strategies—including zero-shot, few-shot, and chain-of-thought methods—on the ToxiGen dataset. The method not only improves overall detection accuracy but also ensures equitable performance across all targeted demographic groups.

Technology Category

Application Category

📝 Abstract

This work proposes a contextualised detection framework for implicitly hateful speech, implemented as a multi-agent system comprising a central Moderator Agent and dynamically constructed Community Agents representing specific demographic groups. Our approach explicitly integrates socio-cultural context from publicly available knowledge sources, enabling identity-aware moderation that surpasses state-of-the-art prompting methods (zero-shot prompting, few-shot prompting, chain-of-thought prompting) and alternative approaches on a challenging ToxiGen dataset. We enhance the technical rigour of performance evaluation by incorporating balanced accuracy as a central metric of classification fairness that accounts for the trade-off between true positive and true negative rates. We demonstrate that our community-driven consultative framework significantly improves both classification accuracy and fairness across all target groups.

Problem

Research questions and friction points this paper is trying to address.

implicit hate speech

detection

socio-cultural context

classification fairness

multi-agent system

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent system

implicit hate speech detection

community-driven framework