GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AI content moderation systems struggle with co-occurring violations—where a single piece of content breaches multiple policies—and dynamic rules that evolve with contextual platform policies, often leading to erroneous removals or missed detections. To address these challenges, this work proposes GMP, the first content moderation benchmark designed for real-world scenarios, which systematically integrates co-occurring violations and dynamic rule enforcement. Built around large language models (LLMs), GMP introduces a comprehensive evaluation framework featuring multi-label violation classification and context-sensitive rule reasoning tasks. Experimental results demonstrate that state-of-the-art LLMs perform significantly worse on GMP compared to static benchmarks, exposing their fragility in complex, dynamic environments and establishing a new paradigm for evaluating robust content moderation systems.

Technology Category

Application Category

📝 Abstract
Online content moderation is essential for maintaining a healthy digital environment, and reliance on AI for this task continues to grow. Consider a user comment using national stereotypes to insult a politician. This example illustrates two critical challenges in real-world scenarios: (1) Co-occurring Violations, where a single post violates multiple policies (e.g., prejudice and personal attacks); (2) Dynamic rules of moderation, where determination of a violation depends on platform-specific guidelines that evolve across contexts . The intersection of co-occurring harms and dynamically changing rules highlights a core limitation of current AI systems: although large language models (LLMs) are adept at following fixed guidelines, their judgment capabilities degrade when policies are unstable or context-dependent . In practice, such shortcomings lead to inconsistent moderation: either erroneously restricting legitimate expression or allowing harmful content to remain online . This raises a critical question for evaluation: Does high performance on existing static benchmarks truly guarantee robust generalization of AI judgment to real-world scenarios involving co-occurring violations and dynamically changing rules?
Problem

Research questions and friction points this paper is trying to address.

Content Moderation
Co-occurring Violations
Dynamic Rules
AI Judgment
Benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

co-occurring violations
dynamic moderation rules
content moderation benchmark
LLM robustness
context-dependent policy
🔎 Similar Papers
No similar papers found.
H
Houde Dong
Beijing University of Posts and Telecommunications
Y
Yifei She
Beijing University of Posts and Telecommunications
Kai Ye
Kai Ye
The University of Hong Kong, Tsinghua University
AI SafetyAgentic LLM
Liangcai Su
Liangcai Su
The University of Hong Kong, Tsinghua University
Data MiningLarge Language ModelsDeep Research Agents
C
Chenxiong Qian
The University of Hong Kong
J
Jie Hao
Beijing University of Posts and Telecommunications