The Enforcement and Feasibility of Hate Speech Moderation on Twitter

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This study addresses the inconsistent enforcement and uncertain scalability of hate speech moderation on social media platforms. Conducting the first global audit of over 540,000 multilingual tweets on Twitter, the research combines human annotation with simulated automated moderation to empirically demonstrate that approximately 80% of hateful content remains online five months after posting, with neither severity nor visibility significantly affecting its likelihood of removal. Building on these findings, the work proposes and simulates a cost-effective human–AI collaborative moderation mechanism that substantially reduces users’ exposure to hate speech while operating at a cost lower than potential regulatory fines. This approach offers a scalable and economically viable solution for platform-level content governance.

Technology Category

Application Category

📝 Abstract

Online hate speech is associated with substantial social harms, yet it remains unclear how consistently platforms enforce hate speech policies or whether enforcement is feasible at scale. We address these questions through a global audit of hate speech moderation on Twitter (now X). Using a complete 24-hour snapshot of public tweets, we construct representative samples comprising 540,000 tweets annotated for hate speech by trained annotators across eight major languages. Five months after posting, 80% of hateful tweets remain online, including explicitly violent hate speech. Such tweets are no more likely to be removed than non-hateful tweets, with neither severity nor visibility increasing the likelihood of removal. We then examine whether these enforcement gaps reflect technical limits of large-scale moderation systems. While fully automated detection systems cannot reliably identify hate speech without generating large numbers of false positives, they effectively prioritize likely violations for human review. Simulations of a human-AI moderation pipeline indicate that substantially reducing user exposure to hate speech is economically feasible at a cost below existing regulatory penalties. These results suggest that the persistence of online hate cannot be explained by technical constraints alone but also reflects institutional choices in the allocation of moderation resources.

Problem

Research questions and friction points this paper is trying to address.

hate speech

content moderation

platform enforcement

social media

feasibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

hate speech moderation

human-AI pipeline

content moderation audit