🤖 AI Summary
This work addresses the tendency of consensus-driven community moderation systems to incentivize strategic conformity, thereby suppressing informative minority viewpoints and undermining independent judgment on contentious content. Focusing on the Community Notes feature on X (formerly Twitter), the authors propose a two-stage auditing and aggregation algorithm. In the first stage, contributor reliability is assessed through behavioral modeling and a latent factor framework that evaluates the stability of historical residuals—i.e., deviations from ground truth—rather than alignment with majority opinion. In the second stage, these stability-based estimates are used to weight contributions during note aggregation. Empirical evaluation demonstrates that this approach significantly improves out-of-sample prediction accuracy while preserving informational diversity and mitigating the marginalization of dissenting perspectives.
📝 Abstract
Online social platforms increasingly rely on crowd-sourced systems to label misleading content at scale, but these systems must both aggregate users' evaluations and decide whose evaluations to trust. To address the latter, many platforms audit users by rewarding agreement with the final aggregate outcome, a design we term consensus-based auditing. We analyze the consequences of this design in X's Community Notes, which in September 2022 adopted consensus-based auditing that ties users' eligibility for participation to agreement with the eventual platform outcome. We find evidence of strategic conformity: minority contributors' evaluations drift toward the majority and their participation share falls on controversial topics, where independent signals matter most. We formalize this mechanism in a behavioral model in which contributors trade off private beliefs against anticipated penalties for disagreement. Motivated by these findings, we propose a two-stage auditing and aggregation algorithm that weights contributors by the stability of their past residuals rather than by agreement with the majority. The method first accounts for differences across content and contributors, and then measures how predictable each contributor's evaluations are relative to the latent-factor model. Contributors whose evaluations are consistently informative receive greater influence in aggregation, even when they disagree with the prevailing consensus. In the Community Notes data, this approach improves out-of-sample predictive performance while avoiding penalization of disagreement.