Open-Domain Safety Policy Construction

📅 2026-04-01

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This work addresses the high cost of manually drafting and maintaining domain-specific content safety policies by proposing Deep Policy Research (DPR), a system that requires only a small set of seed policies per domain. DPR employs a specialized structured research loop that integrates a single web search, a lightweight agent framework, and a compact reading-focused large language model to iteratively generate queries, distill relevant web content, and synthesize it into structured policy rules. Compared to general-purpose deep research systems, DPR demonstrates significant improvements in both efficiency and effectiveness, outperforming definition-only and in-context learning baselines on the OpenAI inappropriate content benchmark as well as an internal multimodal ad moderation benchmark, achieving expert-level policy quality in several domains.

Technology Category

Application Category

📝 Abstract

Moderation layers are increasingly a core component of many products built on user- or model-generated content. However, drafting and maintaining domain-specific safety policies remains costly. We present Deep Policy Research (DPR), a minimal agentic system that drafts a full content moderation policy based on only human-written seed domain information. DPR uses a single web search tool and lightweight scaffolding to iteratively propose search queries, distill diverse web sources into policy rules, and organize rules into an indexed document. We evaluate DPR on (1) the OpenAI undesired content benchmark across five domains with two compact reader LLMs and (2) an in-house multimodal advertisement moderation benchmark. DPR consistently outperforms definition-only and in-context learning baselines, and in our end-to-end setting it is competitive with expert-written policy sections in several domains. Moreover, under the same seed specification and evaluation protocol, DPR outperforms a general-purpose deep research system, suggesting that a task-specific, structured research loop can be more effective than generic web research for policy drafting. We release our experiment code at https://github.com/xiaowu0162/deep-policy-research.

Problem

Research questions and friction points this paper is trying to address.

content moderation

safety policy

open-domain

policy drafting

domain-specific

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Policy Research

content moderation policy

agentic system