Who Decides What Is Harmful? Content Moderation Policy Through A Multi-Agent Personalised Inference Framework

📅 2026-05-02

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This work addresses the limitations of traditional content moderation systems, which rely on centralized rules and fail to account for users’ subjective sensitivities to harmful content. The authors propose a novel personalized reasoning framework based on large language models that integrates a multi-agent architecture with user sensitivity modeling. By simulating collaborative interactions among expert, moderator, and user-profile agents, the system tailors content filtering to individual preferences. The approach significantly enhances alignment between moderation outcomes and users’ personal sensitivity thresholds while maintaining platform-level governance efficacy. Experimental results demonstrate up to a 32% improvement in accuracy over non-personalized baselines, offering a scalable new paradigm for content moderation that balances individual rights with platform responsibilities.

📝 Abstract

The increasing scale and complexity of online platforms raises critical policy questions around harmful content, digital well-being, and user autonomy. Traditional content moderation systems rely on centralised, top-down rules, often failing to accommodate the subjective nature of harm perception. This paper proposes an LLM-based multi-agent personalised inference framework that filters content based on unique sensitivity profiles of individual users. Our architecture combines domain-specific Expert Agents, a Manager Agent for orchestrating content analysis and agent selection, and a Ghost Profile Agent for simulating user perspectives, to inform moderation decisions. Evaluated against a range of non-personalised baselines, the system demonstrates up to a 32% improvement in accuracy, showing increased alignment with individual user sensitivities. Beyond technical performance, our framework provides policy-relevant insights for platform governance, providing a scalable way to reconcile moderation policies with societal and individual digital rights

Problem

Research questions and friction points this paper is trying to address.

content moderation

harm perception

user autonomy

personalised inference

digital well-being

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent system

personalized content moderation

large language models