Towards Safer Chatbots: A Framework for Policy Compliance Evaluation of Custom GPTs

📅 2025-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Custom GPTs—deployed in platforms like the OpenAI GPT Store—pose significant security and compliance risks due to their opaque “black-box” nature, yet systematic evaluation frameworks remain absent. Method: We propose the first end-to-end automated compliance evaluation framework comprising three components: (1) automatic discovery of Custom GPTs, (2) policy-guided red-teaming prompt generation based on regulatory classification, and (3) LLM-as-a-judge–driven violation detection, validated by human annotation (F1 = 0.975). Contribution/Results: Evaluated on 782 real-world Custom GPTs, we find that 58.7% violate OpenAI’s Usage Policies; violations stem predominantly from underlying base models—not user customization—and correlate neither with popularity nor deployment frequency. These findings expose critical gaps in the GPT Store’s current moderation pipeline and provide empirical evidence and a methodological foundation for cross-platform large language model governance.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have gained unprecedented prominence, achieving widespread adoption across diverse domains and integrating deeply into society. The capability to fine-tune general-purpose LLMs, such as Generative Pre-trained Transformers (GPT), for specific tasks has facilitated the emergence of numerous Custom GPTs. These tailored models are increasingly made available through dedicated marketplaces, such as OpenAI's GPT Store. However, their black-box nature introduces significant safety and compliance risks. In this work, we present a scalable framework for the automated evaluation of Custom GPTs against OpenAI's usage policies, which define the permissible behaviors of these systems. Our framework integrates three core components: (1) automated discovery and data collection of models from the GPT store, (2) a red-teaming prompt generator tailored to specific policy categories and the characteristics of each target GPT, and (3) an LLM-as-a-judge technique to analyze each prompt-response pair for potential policy violations. We validate our framework with a manually annotated ground truth, and evaluate it through a large-scale study with 782 Custom GPTs across three categories: Romantic, Cybersecurity, and Academic GPTs. Our manual annotation process achieved an F1 score of 0.975 in identifying policy violations, confirming the reliability of the framework's assessments. The results reveal that 58.7% of the analyzed models exhibit indications of non-compliance, exposing weaknesses in the GPT store's review and approval processes. Furthermore, our findings indicate that a model's popularity does not correlate with compliance, and non-compliance issues largely stem from behaviors inherited from base models rather than user-driven customizations. We believe this approach is extendable to other chatbot platforms and policy domains, improving LLM-based systems safety.
Problem

Research questions and friction points this paper is trying to address.

Customized Large Language Models
Safety and Compliance
Black Box Nature
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated Compliance Framework
Customized GPT Models
Security Enhancement
🔎 Similar Papers
No similar papers found.
D
David Rodriguez
ETSI Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain
William Seymour
William Seymour
Lecturer in Cybersecurity, King's College London
CybersecurityVoice AssistantsSmart HomesPrivacy
J
Jose M. Del Alamo
ETSI Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain
Jose Such
Jose Such
Research Professor, Spanish National Research Council (CSIC)
Privacy & SecurityArtificial IntelligenceHuman-Computer Interaction