🤖 AI Summary
This study identifies a systemic social bias in online hate-content reporting mechanisms: users are significantly more likely to report abusive content targeting their own social group (in-group) than identical content targeting out-groups, thereby undermining platform-level harm mitigation. Through five high-statistical-power, pre-registered online experiments spanning four sensitive domains—political orientation, vaccine attitudes, climate change beliefs, and abortion rights—we find that approximately 50% of abusive comments are reported, with a robust and statistically significant in-group reporting bias across all contexts (mean effect size *d* = 0.38). This work provides the first multi-domain, ecologically valid empirical demonstration of identity-dependent reporting behavior. It establishes critical behavioral evidence and theoretical grounding for redesigning fairer, more robust content moderation systems that account for sociopsychological reporting biases.
📝 Abstract
The prevalence of online hate and abuse is a pressing global concern. While tackling such societal harms is a priority for research across the social sciences, it is a difficult task, in part because of the magnitude of the problem. User engagement with reporting mechanisms (flagging) online is an increasingly important part of monitoring and addressing harmful content at scale. However, users may not flag content routinely enough, and when they do engage, they may be biased by group identity and political beliefs. Across five well-powered and pre-registered online experiments, we examine the extent of social bias in the flagging of hate and abuse in four different intergroup contexts: political affiliation, vaccination opinions, beliefs about climate change, and stance on abortion rights. Overall, participants reported abuse reliably, with approximately half of the abusive comments in each study reported. However, a pervasive social bias was present whereby ingroup-directed abuse was consistently flagged to a greater extent than outgroup-directed abuse. Our findings offer new insights into the nature of user flagging online, an understanding of which is crucial for enhancing user intervention against online hate speech and thus ensuring a safer online environment.