🤖 AI Summary
Large language models (LLMs) employ opaque and dynamically updated content moderation policies, implicitly shaping public discourse without systematic monitoring mechanisms. To address this gap, we propose AI Watchman—the first longitudinal auditing framework specifically designed to analyze LLM content refusal behavior. It leverages a multilingual dataset spanning 400+ sociopolitical topics to conduct automated, cross-temporal and cross-model audits of leading models—including OpenAI’s GPT-4.1 and GPT-5, and DeepSeek—complemented by qualitative analysis. AI Watchman achieves three novel contributions: (1) detection of unannounced policy shifts in content moderation; (2) quantitative measurement of inter-vendor and inter-model moderation disparities; and (3) construction of a systematic taxonomy of refusal types. Empirical evaluation demonstrates its efficacy in exposing opaque moderation logic and supporting regulatory assessment and public oversight. The framework establishes a reproducible methodological foundation for advancing LLM transparency research.
📝 Abstract
Large language models' (LLMs') outputs are shaped by opaque and frequently-changing company content moderation policies and practices. LLM moderation often takes the form of refusal; models' refusal to produce text about certain topics both reflects company policy and subtly shapes public discourse. We introduce AI Watchman, a longitudinal auditing system to publicly measure and track LLM refusals over time, to provide transparency into an important and black-box aspect of LLMs. Using a dataset of over 400 social issues, we audit Open AI's moderation endpoint, GPT-4.1, and GPT-5, and DeepSeek (both in English and Chinese). We find evidence that changes in company policies, even those not publicly announced, can be detected by AI Watchman, and identify company- and model-specific differences in content moderation. We also qualitatively analyze and categorize different forms of refusal. This work contributes evidence for the value of longitudinal auditing of LLMs, and AI Watchman, one system for doing so.