π€ AI Summary
This study systematically evaluates the effectiveness and evolutionary dynamics of Xβs Community Notes crowdsourced content moderation system. Addressing the lack of long-term, structured empirical data in prior work, we construct the first large-scale, open-source dataset covering 2020β2024, featuring multilingual detection, fine-grained topic classification, URL semantic extraction, and monthly user collaboration networks. Methodologically, we integrate NLP, language identification, dynamic topic modeling, and complex network analysis, complemented by a systematic literature review. Our key contributions are: (1) releasing the first four-year structured Community Notes dataset alongside a complete, reproducible analytical toolchain; (2) uncovering cross-lingual and cross-topical mechanisms of consensus formation and bias patterns in community moderation; and (3) establishing a replicable empirical framework and standardized research infrastructure for trustworthy platform content governance.
π Abstract
Community Notes (formerly known as Birdwatch) is the first large-scale crowdsourced content moderation initiative that was launched by X (formerly known as Twitter) in January 2021. As the Community Notes model gains momentum across other social media platforms, there is a growing need to assess its underlying dynamics and effectiveness. This Resource paper provides (a) a systematic review of the literature on Community Notes, and (b) a major curated dataset and accompanying source code to support future research on Community Notes. We parsed Notes and Ratings data from the first four years of the program and conducted language detection across all Notes. Focusing on English-language Notes, we extracted embedded URLs and identified discussion topics in each Note. Additionally, we constructed monthly interaction networks among the Contributors. Together with the literature review, these resources offer a robust foundation for advancing research on the Community Notes system.