Quantifying Feature Importance for Online Content Moderation

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of accurately predicting user behavioral responses to content moderation interventions to inform user-centered policy design. Leveraging pre- and post-intervention behavioral data from 16,800 Reddit users, we propose a novel feature importance assessment framework that integrates quantification learning with greedy feature selection, modeling 753-dimensional features spanning social behavior, linguistic patterns, relational networks, and psychological attributes. Results reveal that response heterogeneity is jointly driven by a small set of cross-task generalizable features—such as historical participation stability—and numerous task-specific features. The model achieves strong performance in predicting changes in activity levels and toxicity, while diversity prediction remains comparatively challenging. These findings empirically characterize the multidimensional and heterogeneous nature of user responses, providing both methodological grounding and empirical support for interpretable, customizable content moderation strategies.

Technology Category

Application Category

📝 Abstract
Accurately estimating how users respond to moderation interventions is paramount for developing effective and user-centred moderation strategies. However, this requires a clear understanding of which user characteristics are associated with different behavioural responses, which is the goal of this work. We investigate the informativeness of 753 socio-behavioural, linguistic, relational, and psychological features, in predicting the behavioural changes of 16.8K users affected by a major moderation intervention on Reddit. To reach this goal, we frame the problem in terms of "quantification", a task well-suited to estimating shifts in aggregate user behaviour. We then apply a greedy feature selection strategy with the double goal of (i) identifying the features that are most predictive of changes in user activity, toxicity, and participation diversity, and (ii) estimating their importance. Our results allow identifying a small set of features that are consistently informative across all tasks, and determining that many others are either task-specific or of limited utility altogether. We also find that predictive performance varies according to the task, with changes in activity and toxicity being easier to estimate than changes in diversity. Overall, our results pave the way for the development of accurate systems that predict user reactions to moderation interventions. Furthermore, our findings highlight the complexity of post-moderation user behaviour, and indicate that effective moderation should be tailored not only to user traits but also to the specific objective of the intervention.
Problem

Research questions and friction points this paper is trying to address.

Identifying key user features predicting behavioral changes after moderation
Quantifying feature importance for user activity, toxicity, and diversity shifts
Developing accurate systems to predict user reactions to moderation interventions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Greedy feature selection identifies predictive user characteristics
Quantification framework estimates aggregate user behavior shifts
Analyzes socio-behavioral linguistic relational psychological features
🔎 Similar Papers
No similar papers found.