Evaluating Online Moderation Via LLM-Powered Counterfactual Simulations

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Empirical evaluation of content moderation policies in online social networks (OSNs) remains challenging due to high real-world data acquisition costs and poor experimental controllability. Method: We propose the first LLM-driven counterfactual simulation framework for moderation policy assessment, featuring psychologically grounded LLM agents that simulate user dialogue and social behavior under rigorously controlled conditions, enabling parallel evaluation of diverse moderation interventions on toxic content suppression. Contribution/Results: Our experiments successfully replicate social contagion effects, validating the psychological fidelity of agent behavior. Quantitative results demonstrate that personalized moderation significantly outperforms generic strategies in toxicity mitigation. The framework establishes a novel, reproducible, interpretable, and low-cost causal evaluation paradigm for OSN governance—bridging the gap between theoretical policy design and empirically grounded impact assessment.

Technology Category

Application Category

📝 Abstract

Online Social Networks (OSNs) widely adopt content moderation to mitigate the spread of abusive and toxic discourse. Nonetheless, the real effectiveness of moderation interventions remains unclear due to the high cost of data collection and limited experimental control. The latest developments in Natural Language Processing pave the way for a new evaluation approach. Large Language Models (LLMs) can be successfully leveraged to enhance Agent-Based Modeling and simulate human-like social behavior with unprecedented degree of believability. Yet, existing tools do not support simulation-based evaluation of moderation strategies. We fill this gap by designing a LLM-powered simulator of OSN conversations enabling a parallel, counterfactual simulation where toxic behavior is influenced by moderation interventions, keeping all else equal. We conduct extensive experiments, unveiling the psychological realism of OSN agents, the emergence of social contagion phenomena and the superior effectiveness of personalized moderation strategies.

Problem

Research questions and friction points this paper is trying to address.

Evaluating real effectiveness of online content moderation interventions

Simulating counterfactual social interactions using LLM-powered agents

Testing personalized moderation strategies through parallel conversation simulations

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-powered simulator for OSN conversations

Parallel counterfactual simulation of moderation interventions

Personalized moderation strategies show superior effectiveness

🔎 Similar Papers

Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities