🤖 AI Summary
This work addresses the lack of standardized, comparable datasets in content moderation research, which has hindered systematic evaluation of the effectiveness and biases of different intervention strategies. To bridge this gap, we introduce TBBT, a large-scale dataset encompassing 25 distinct moderation interventions, 339,000 users, and nearly 39 million messages. The dataset includes three months of standardized metadata and anonymized user behavioral records both before and after each intervention. TBBT enables, for the first time, reproducible, multidimensional, and cross-intervention analyses, substantially improving the consistency and efficiency of moderation impact assessment. It supports a wide range of research scenarios and lays the groundwork for more systematic investigation in the field of content moderation.
📝 Abstract
Online platforms rely on moderation interventions to curb harmful behavior such hate speech, toxicity, and the spread of mis- and disinformation. Yet research on the effects and possible biases of such interventions faces multiple limitations. For example, existing works frequently focus on single or a few interventions, due to the absence of comprehensive datasets. As a result, researchers must typically collect the necessary data for each new study, which limits opportunities for systematic comparisons. To overcome these challenges, we introduce The Big Ban Theory (TBBT), a large dataset of moderation interventions. TBBT covers 25 interventions of varying type, severity, and scope, comprising in total over 339K users and nearly 39M posted messages. For each intervention, we provide standardized metadata and pseudonymized user activity collected three months before and after its enforcement, enabling consistent and comparable analyses of intervention effects. In addition, we provide a descriptive exploratory analysis of the dataset, along with several use cases of how it can support research on content moderation. With this dataset, we aim to support researchers studying the effects of moderation interventions and to promote more systematic, reproducible, and comparable research. TBBT is publicly available at: https://doi.org/10.5281/zenodo.18245670.