IYKYK (But AI Doesn't): Automated Content Moderation Does Not Capture Communities' Heterogeneous Attitudes Towards Reclaimed Language

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Current AI content moderation systems struggle to distinguish between the reclaimed use of derogatory terms by marginalized groups and genuine hate speech, often erroneously removing legitimate expressions. Focusing on LGBTQIA+, Black, and women’s communities, this study integrates crowdsourced annotations, semi-structured interviews, and contextual feature analysis to construct and evaluate a corpus enriched with speaker identity, intent, and situational context. It reveals, for the first time, substantial heterogeneity within communities regarding judgments of reclaimed language—challenging the assumption that AI moderation can rely on uniform standards. Key features such as self-referentiality and perceived offensiveness significantly influence these judgments. Moreover, outputs from mainstream APIs like Perspective show marked divergence from community members’ assessments, underscoring the critical role of identity and lived experience in semantic interpretation.

Technology Category

Application Category

📝 Abstract

Reclaimed slur usage is a common and meaningful practice online for many marginalized communities. It serves as a source of solidarity, identity, and shared experience. However, contemporary automated and AI-based moderation tools for online content largely fail to distinguish between reclaimed and hateful uses of slurs, resulting in the suppression of marginalized voices. In this work, we use quantitative and qualitative methods to examine the attitudes of social media users in LGBTQIA+, Black, and women communities around reclaimed slurs targeting our focus groups including the f-word, n-word, and b-word. With social media users from these communities, we collect and analyze an annotated online slur usage corpus. The corpus includes annotators' perceptions of whether an online text containing a slur should be flagged as hate speech, as well as contextual features of the slur usage. Across all communities and annotation questions, we observe low inter-annotator agreement, indicating substantial disagreement among in-group annotators. This is compounded by the fact that, absent clear contextual signals of identity and intent, even in-group members may disagree on how to interpret reclaimed slur usage online. Semi-structured interviews with annotators suggest that differences in lived experience and personal history contribute to this variation as well. We find poor alignment between annotator judgments and automated hate speech assessments produced by Perspective API. We further observe that certain features of a text such as whether the slur usage was derogatory and if the slur was targeted at oneself are more associated with whether annotators report the text as hate speech. Together, these findings highlight the inherent subjectivity and contextual nature of how marginalized communities interpret slurs online.

Problem

Research questions and friction points this paper is trying to address.

reclaimed slurs

automated content moderation

hate speech detection

marginalized communities

contextual interpretation

Innovation

Methods, ideas, or system contributions that make the work stand out.

reclaimed slurs

automated content moderation

hate speech detection