🤖 AI Summary
This study addresses the challenge of detecting implicit hate speech—such as coded language, dog whistles, and racial manipulation—in social media, where single-post analysis proves insufficient. We propose a user-level, multimodal approach to identify hate propagators by jointly modeling textual semantics (via NLP), behavioral patterns (e.g., posting frequency and topic preferences), and social network structure (using graph neural networks) within broader sociocultural contexts. Our key contribution is the first shift in hate detection granularity from *content* to *user*, enabling holistic, cross-platform generalization (across Twitter, Gab, and Parler). Experiments demonstrate significant improvements over unimodal baselines—particularly text-only or graph-only models—in precision, robustness, and implicit expression detection. The method offers a scalable, deployable framework for large-scale online hate governance.
📝 Abstract
Automatic detection of online hate speech serves as a crucial step in the detoxification of the online discourse. Moreover, accurate classification can promote a better understanding of the proliferation of hate as a social phenomenon. While most prior work focus on the detection of hateful utterances, we argue that focusing on the user level is as important, albeit challenging. In this paper we consider a multimodal aggregative approach for the detection of hate-mongers, taking into account the potentially hateful texts, user activity, and the user network. Evaluating our method on three unique datasets X (Twitter), Gab, and Parler we show that processing a user's texts in her social context significantly improves the detection of hate mongers, compared to previously used text and graph-based methods. We offer comprehensive set of results obtained in different experimental settings as well as qualitative analysis of illustrative cases. Our method can be used to improve the classification of coded messages, dog-whistling, and racial gas-lighting, as well as to inform intervention measures. Moreover, we demonstrate that our multimodal approach performs well across very different content platforms and over large datasets and networks.