Who Decides What Is Harmful? Content Moderation Policy Through A Multi-Agent Personalised Inference Framework

πŸ“… 2026-05-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

238K/year
πŸ€– AI Summary
This work addresses the limitations of traditional content moderation systems, which rely on centralized rules and fail to account for users’ subjective sensitivities to harmful content. The authors propose a novel personalized reasoning framework based on large language models that integrates a multi-agent architecture with user sensitivity modeling. By simulating collaborative interactions among expert, moderator, and user-profile agents, the system tailors content filtering to individual preferences. The approach significantly enhances alignment between moderation outcomes and users’ personal sensitivity thresholds while maintaining platform-level governance efficacy. Experimental results demonstrate up to a 32% improvement in accuracy over non-personalized baselines, offering a scalable new paradigm for content moderation that balances individual rights with platform responsibilities.
πŸ“ Abstract
The increasing scale and complexity of online platforms raises critical policy questions around harmful content, digital well-being, and user autonomy. Traditional content moderation systems rely on centralised, top-down rules, often failing to accommodate the subjective nature of harm perception. This paper proposes an LLM-based multi-agent personalised inference framework that filters content based on unique sensitivity profiles of individual users. Our architecture combines domain-specific Expert Agents, a Manager Agent for orchestrating content analysis and agent selection, and a Ghost Profile Agent for simulating user perspectives, to inform moderation decisions. Evaluated against a range of non-personalised baselines, the system demonstrates up to a 32% improvement in accuracy, showing increased alignment with individual user sensitivities. Beyond technical performance, our framework provides policy-relevant insights for platform governance, providing a scalable way to reconcile moderation policies with societal and individual digital rights
Problem

Research questions and friction points this paper is trying to address.

content moderation
harm perception
user autonomy
personalised inference
digital well-being
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent system
personalized content moderation
large language models
user sensitivity profiling
digital rights
πŸ”Ž Similar Papers
No similar papers found.