Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices at Leading AI Companies

📅 2026-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical absence of independent, credible third-party verification mechanisms for safety practices claimed by frontier AI companies, which undermines public ability to assess their trustworthiness. The paper proposes the first auditing framework specifically designed for frontier AI systems, enabling qualified third-party experts to conduct in-depth evaluations of both AI systems and their development processes through controlled access to non-public information. Its core innovation lies in defining a tiered Assurance Assurance Level (AAL) scale—ranging from AAL-1 (one-time audits) to AAL-4 (continuous, deception-resistant validation)—which establishes a systematic, quantifiable, comparable, and actionable paradigm for AI auditing. This framework provides a robust foundation to inform regulatory decisions, deployment choices, and public trust in advanced AI systems.

Technology Category

Application Category

📝 Abstract
We outline a vision for frontier AI auditing, which we define as rigorous third-party verification of frontier AI developers'safety and security claims, and evaluation of their systems and practices against relevant standards, based on deep, secure access to non-public information. Frontier AI audits should not be limited to a company's publicly deployed products, but should instead consider the full range of organization-level safety and security risks, including internal deployment of AI systems, information security practices, and safety decision-making processes. We describe four AI Assurance Levels (AALs), the higher levels of which provide greater confidence in audit findings. We recommend AAL-1 as a baseline for frontier AI generally, and AAL-2 as a near-term goal for the most advanced subset of frontier AI developers. Achieving the vision we outline will require (1) ensuring high quality standards for frontier AI auditing, so it does not devolve into a checkbox exercise or lag behind changes in the industry; (2) growing the ecosystem of audit providers at a rapid pace without compromising quality; (3) accelerating adoption of frontier AI auditing by clarifying and strengthening incentives; and (4) achieving technical readiness for high AI Assurance Levels so they can be applied when needed.
Problem

Research questions and friction points this paper is trying to address.

Frontier AI
Third-party auditing
Safety and security
AI assurance
Trustworthiness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frontier AI Auditing
AI Assurance Levels
third-party verification
security evaluation
deception-resilient verification
M
M. Brundage
AVERI
Noemi Dreksler
Noemi Dreksler
Centre for the Governance of AI
AI GovernanceAI Risk ManagementPublic & Expert Opinion of AI
A
Aidan Homewood
GovAI
S
Sean McGregor
AVERI
Patricia Paskov
Patricia Paskov
RAND
AI evaluationAI governanceeconomicsinternational development
C
Conrad Stosz
Transluce
Girish Sastry
Girish Sastry
Independent
artificial intelligence policymachine learningartificial intelligence
A
A. F. Cooper
AVERI
G
George Balston
AVERI
S
Steven Adler
Clear-Eyed AI
Stephen Casper
Stephen Casper
PhD student, MIT
AI safetyAI responsibilityred-teamingrobustnessauditing
Markus Anderljung
Markus Anderljung
Centre for the Governance of AI
AI governanceAI policyAI forecasting
G
Grace Werner
AVERI
Sören Mindermann
Sören Mindermann
University of Oxford, OATML
AI safetydeep learningactive learningcausal inferenceCOVID-19
Vasilios Mavroudis
Vasilios Mavroudis
Research Scientist, Alan Turing Institute
Machine LearningSystems SecurityArtificial Intelligence
B
Benjamin Bucknall
University of Oxford
C
Charlotte Stix
Apollo Research
J
Jonas Freund
GovAI
Lorenzo Pacchiardi
Lorenzo Pacchiardi
Research Associate, University of Cambridge
Large Language ModelsAI evaluationAI policyBayesian InferenceLikelihood-Free Inference
J
J. Hernández-Orallo
University of Cambridge
M
Matteo Pistillo
Apollo Research
Michael Chen
Michael Chen
Undergraduate, Carnegie Mellon University
C
Chris Painter
METR
D
Dean W. Ball
Foundation for American Innovation
C
Cullen O'Keefe
Institute for Law and AI
G
Gabriel Weil
Touro University Law Center
Ben Harack
Ben Harack
Oxford University Department of Politics and International Relations
International RelationsArtificial IntelligenceAI GovernanceSemiconductors
G
Graeme Finley
Independent
R
Ryan Hassan
New Science
Scott Emmons
Scott Emmons
Google DeepMind
AI AlignmentAdversarial RobustnessInterpretabilityCooperative AI
C
Charles Foster
METR
Anka Reuel
Anka Reuel
CS Ph.D. Candidate, Stanford University
AI GovernanceResponsible AIAI EthicsAI Safety
B
Bri Treece
Fathom
Y
Y. Bengio
Mila, Université de Montréal
D
Daniel Reti
Exona Lab
Rishi Bommasani
Rishi Bommasani
CS PhD, Stanford University
Societal Impact of AIAI PolicyAI GovernanceFoundation Models
C
Cristian Trout
AI Underwriting Company
A
A. Shamsabadi
Brave Software
R
Rajiv Dattani
AI Underwriting Company
Adrian Weller
Adrian Weller
Director of Research, Machine Learning, University of Cambridge
Machine LearningArtificial IntelligenceTrustworthinessOptimizationEthics
Robert Trager
Robert Trager
University of Oxford
AI GovernanceDiplomacyInstitutional DesignSocial TheoryApplied Mathematics
Jaime Sevilla
Jaime Sevilla
Director, Epoch
AIforecastingeconomics of AIAI forecasting
L
Lauren Wagner
Abundance Institute
Lisa Soder
Lisa Soder
London School of Economics
K
Ketan Ramakrishnan
Yale University
H
Henry Papadatos
SaferAI
M
Malcolm Murray
SaferAI
R
Ryan Tovcimak
UL Solutions