In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The widespread deployment of General-Purpose Artificial Intelligence (GPAI) systems introduces novel security risks, yet existing vulnerability disclosure infrastructure remains underdeveloped—lacking standardized practices, governance frameworks, and institutional coordination, especially compared to mature domains like software security. Method: This paper proposes a robust, third-party–driven, cross-organizational vulnerability reporting ecosystem for GPAI. It introduces the first standardized AI defect reporting template and code of conduct; designs a legally protected, broadly scoped GPAI vulnerability bounty program; and establishes a multi-stakeholder coordinated disclosure infrastructure. The approach integrates security engineering, machine learning, law, and policy design to align with AI system characteristics. Contribution/Results: The proposed three-tier intervention framework significantly improves defect reproducibility, response latency, and cross-vendor vulnerability containment. It delivers a practical, scalable pathway toward secure, transparent, and accountable AI governance.

Technology Category

Application Category

📝 Abstract
The widespread deployment of general-purpose AI (GPAI) systems introduces significant new risks. Yet the infrastructure, practices, and norms for reporting flaws in GPAI systems remain seriously underdeveloped, lagging far behind more established fields like software security. Based on a collaboration between experts from the fields of software security, machine learning, law, social science, and policy, we identify key gaps in the evaluation and reporting of flaws in GPAI systems. We call for three interventions to advance system safety. First, we propose using standardized AI flaw reports and rules of engagement for researchers in order to ease the process of submitting, reproducing, and triaging flaws in GPAI systems. Second, we propose GPAI system providers adopt broadly-scoped flaw disclosure programs, borrowing from bug bounties, with legal safe harbors to protect researchers. Third, we advocate for the development of improved infrastructure to coordinate distribution of flaw reports across the many stakeholders who may be impacted. These interventions are increasingly urgent, as evidenced by the prevalence of jailbreaks and other flaws that can transfer across different providers' GPAI systems. By promoting robust reporting and coordination in the AI ecosystem, these proposals could significantly improve the safety, security, and accountability of GPAI systems.
Problem

Research questions and friction points this paper is trying to address.

Standardizing AI flaw reports for easier submission and reproduction
Implementing broad-scoped flaw disclosure programs with legal protections
Developing infrastructure to coordinate flaw reports among stakeholders
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized AI flaw reports for easier submission
Broadly-scoped flaw disclosure programs adoption
Improved infrastructure for flaw report coordination
🔎 Similar Papers
No similar papers found.
Shayne Longpre
Shayne Longpre
MIT, Stanford, Apple
Deep LearningNatural Language Understanding
Kevin Klyman
Kevin Klyman
Stanford, Harvard
Foundation ModelsAI RegulationGeopolitics
R
Ruth E Appel
Stanford University
Sayash Kapoor
Sayash Kapoor
CS PhD, Princeton University
ReproducibilityAI agentsSocietal impacts
Rishi Bommasani
Rishi Bommasani
CS PhD, Stanford University
Societal Impact of AIAI PolicyAI GovernanceFoundation Models
M
Michelle Sahar
OpenPolicy
S
Sean McGregor
UL Research Institutes
A
Avijit Ghosh
Hugging Face
B
Borhane Blili-Hamelin
AI Risk and Vulnerability Alliance
N
Nathan Butters
AI Risk and Vulnerability Alliance
A
Alondra Nelson
Institute for Advanced Study
A
Amit Elazari
OpenPolicy
Andrew Sellars
Andrew Sellars
Clinical Associate Professor, Boston University School of Law
Intellectual PropertyFirst AmendmentData PrivacyComputer CrimesTechnology Policy
C
Casey John Ellis
Bugcrowd
D
Dane Sherrets
HackerOne
Dawn Song
Dawn Song
Professor of Computer Science, UC Berkeley
Computer Security and Privacy
H
H. Geiger
Hacking Policy Council
I
Ilona Cohen
HackerOne
L
Lauren McIlvenny
Carnegie Mellon University Software Engineering Institute
M
Madhulika Srikumar
Partnership on AI
M
Mark M. Jaycox
Google
Markus Anderljung
Markus Anderljung
Centre for the Governance of AI
AI governanceAI policyAI forecasting
N
Nadine Farid Johnson
Knight First Amendment Institute at Columbia University
Nicholas Carlini
Nicholas Carlini
Anthropic
N
Nicolas Miailhe
PRISM Eval
N
Nik Marda
Mozilla
Peter Henderson
Peter Henderson
Princeton University
Machine LearningLaw
R
Rebecca S. Portnoff
Thorn
R
Rebecca Weiss
MLCommons
V
V. Westerhoff
Microsoft
Yacine Jernite
Yacine Jernite
Research Scientist, HuggingFace
Machine LearningNatural Language Processing
R
Rumman Chowdhury
Humane Intelligence
Percy Liang
Percy Liang
Associate Professor of Computer Science, Stanford University
machine learningnatural language processing
A
Arvind Narayanan
Princeton University