PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

📅 2026-05-16

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This study addresses the challenge of detecting policy-violating content across diverse online communities, each governed by its own custom norms—a key requirement for pluralistic social media governance. The work formulates this as a multi-choice task and introduces PluRule, the first benchmark for violation detection under heterogeneous community rules, encompassing 1,989 Reddit communities, 2,885 distinct rules, and nine languages. A large-scale multimodal, multilingual evaluation dataset is constructed to support rigorous assessment. Experimental results reveal that state-of-the-art vision-language models exhibit limited performance, barely surpassing simple baselines; gains from increased model scale or extended context are marginal, while violations of more universal rules are consistently easier to identify. This research establishes a new benchmark and provides empirical insights into AI’s capacity for norm-aware reasoning in complex, rule-diverse environments.

📝 Abstract

Social media are shifting towards pluralism -- community-governed platforms where groups define their own norms. What violates rules in one community may be perfectly acceptable in another. Can AI models help moderate such pluralistic communities? We formalize the task as a multiple-choice problem, mirroring how human moderators operate in the real world: given a comment and its surrounding context, identify which specific rule, if any, is violated. We introduce PluRule, a multimodal, multilingual benchmark for detecting 13,371 rule violations across 1,989 Reddit communities spanning 2,885 rules in 9 languages. Using this benchmark, we show that state-of-the-art vision-language models struggle significantly: even GPT-5.2 with high reasoning performs only slightly better than a trivial baseline. We also find that bigger models and increased context provide marginal gains, and universal rules like civility and self-promotion are easier to detect. Our results show that moderation of pluralistic communities on social media is a fundamental challenge for language models. Our code and benchmark are publicly available.

Problem

Research questions and friction points this paper is trying to address.

pluralistic moderation

community-specific rules

social media governance

rule violation detection

multilingual moderation

Innovation

Methods, ideas, or system contributions that make the work stand out.

pluralistic moderation

community-specific rules

multimodal benchmark