🤖 AI Summary
This study systematically evaluates the responses of six mainstream generative AI chatbots to nine conspiracy theory categories—five classic, empirically refutable types and four emerging, event-driven ones—to assess the effectiveness of their safety mechanisms against conspiracy theory propagation. Employing platform policy audits and cross-model comparative experiments, the study reveals pronounced thematic selectivity in corporate content governance: models exhibit high sensitivity to race-related topics and major societal traumas, yet demonstrate weak responsiveness to other high-risk conspiracy theories—particularly those concerning public health and technology ethics. Results indicate substantial inter-model variation in safety performance; several systems still indirectly endorse, evade, or issue ambiguous responses to conspiracy claims, exposing structural deficiencies—including uneven safety coverage and insufficient generalization of moderation rules. The findings establish an empirical benchmark for AI content safety governance and propose targeted, evidence-based optimization pathways.
📝 Abstract
Interactive chat systems that build on artificial intelligence frameworks are increasingly ubiquitous and embedded into search engines, Web browsers, and operating systems, or are available on websites and apps. Researcher efforts have sought to understand the limitations and potential for harm of generative AI, which we contribute to here. Conducting a systematic review of six AI-powered chat systems (ChatGPT 3.5; ChatGPT 4 Mini; Microsoft Copilot in Bing; Google Search AI; Perplexity; and Grok in Twitter/X), this study examines how these leading products respond to questions related to conspiracy theories. This follows the platform policy implementation audit approach established by Glazunova et al. (2023). We select five well-known and comprehensively debunked conspiracy theories and four emerging conspiracy theories that relate to breaking news events at the time of data collection. Our findings demonstrate that the extent of safety guardrails against conspiratorial ideation in generative AI chatbots differs markedly, depending on chatbot model and conspiracy theory. Our observations indicate that safety guardrails in AI chatbots are often very selectively designed: generative AI companies appear to focus especially on ensuring that their products are not seen to be racist; they also appear to pay particular attention to conspiracy theories that address topics of substantial national trauma such as 9/11 or relate to well-established political issues. Future work should include an ongoing effort extended to further platforms, multiple languages, and a range of conspiracy theories extending well beyond the United States.