π€ AI Summary
This work addresses the limitations of traditional content moderation systems, which rely on centralized rules and fail to account for usersβ subjective sensitivities to harmful content. The authors propose a novel personalized reasoning framework based on large language models that integrates a multi-agent architecture with user sensitivity modeling. By simulating collaborative interactions among expert, moderator, and user-profile agents, the system tailors content filtering to individual preferences. The approach significantly enhances alignment between moderation outcomes and usersβ personal sensitivity thresholds while maintaining platform-level governance efficacy. Experimental results demonstrate up to a 32% improvement in accuracy over non-personalized baselines, offering a scalable new paradigm for content moderation that balances individual rights with platform responsibilities.
π Abstract
The increasing scale and complexity of online platforms raises critical policy questions around harmful content, digital well-being, and user autonomy. Traditional content moderation systems rely on centralised, top-down rules, often failing to accommodate the subjective nature of harm perception. This paper proposes an LLM-based multi-agent personalised inference framework that filters content based on unique sensitivity profiles of individual users. Our architecture combines domain-specific Expert Agents, a Manager Agent for orchestrating content analysis and agent selection, and a Ghost Profile Agent for simulating user perspectives, to inform moderation decisions. Evaluated against a range of non-personalised baselines, the system demonstrates up to a 32% improvement in accuracy, showing increased alignment with individual user sensitivities. Beyond technical performance, our framework provides policy-relevant insights for platform governance, providing a scalable way to reconcile moderation policies with societal and individual digital rights