Policy-as-Prompt: Rethinking Content Moderation in the Age of Large Language Models

📅 2025-02-25

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Current content moderation in the large language model (LLM) era relies heavily on manual annotation and rigid rule engines, limiting scalability, adaptability, and interpretability. Method: We propose “Policy-as-Prompt”—a novel paradigm wherein natural-language policy texts are directly fed as prompts to LLMs, enabling zero-shot or few-shot, dynamic, interpretable, and updatable moderation. We formally characterize this framework and identify five core challenges across technical, socio-technical, organizational, and governance dimensions. Our methodology integrates policy semantic parsing, human-AI collaborative governance modeling, and cross-team workflow reengineering. Contribution/Results: We design a scalable moderation architecture that supports real-time policy updates without model retraining. The framework has been adopted into the governance roadmaps of multiple leading platforms and provides actionable pathways for mitigating associated risks.

Technology Category

Application Category

📝 Abstract

Content moderation plays a critical role in shaping safe and inclusive online environments, balancing platform standards, user expectations, and regulatory frameworks. Traditionally, this process involves operationalising policies into guidelines, which are then used by downstream human moderators for enforcement, or to further annotate datasets for training machine learning moderation models. However, recent advancements in large language models (LLMs) are transforming this landscape. These models can now interpret policies directly as textual inputs, eliminating the need for extensive data curation. This approach offers unprecedented flexibility, as moderation can be dynamically adjusted through natural language interactions. This paradigm shift raises important questions about how policies are operationalised and the implications for content moderation practices. In this paper, we formalise the emerging policy-as-prompt framework and identify five key challenges across four domains: Technical Implementation (1. translating policy to prompts, 2. sensitivity to prompt structure and formatting), Sociotechnical (3. the risk of technological determinism in policy formation), Organisational (4. evolving roles between policy and machine learning teams), and Governance (5. model governance and accountability). Through analysing these challenges across technical, sociotechnical, organisational, and governance dimensions, we discuss potential mitigation approaches. This research provides actionable insights for practitioners and lays the groundwork for future exploration of scalable and adaptive content moderation systems in digital ecosystems.

Problem

Research questions and friction points this paper is trying to address.

Redefining content moderation using large language models

Addressing challenges in policy-to-prompt translation

Exploring governance and accountability in AI moderation

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs interpret policies directly

Dynamic moderation via natural language

Policy-as-prompt framework formalized

🔎 Similar Papers

No similar papers found.