Attestable Audits: Verifiable AI Safety Benchmarks Using Trusted Execution Environments

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AI safety benchmarks lack verifiability and struggle to reconcile model intellectual property with dataset confidentiality—particularly under mutual distrust between model providers and auditors, posing severe privacy and trust risks. This paper proposes the first verifiable AI safety benchmarking framework integrating remote attestation and privacy-preserving computation. Leveraging Trusted Execution Environments (TEEs), it ensures encrypted isolation of model and data execution, secure multi-party interaction, and tamper-proof auditing. Crucially, it embeds TEE-based remote attestation into the AI auditing feedback loop, enabling external stakeholders to independently verify system compliance. A prototype implementation on Llama-3.1 successfully demonstrates end-to-end verifiable evaluation across standard safety benchmarks—including HarmBench and AdvBench—achieving strong integrity guarantees without exposing sensitive assets. The framework significantly strengthens trust foundations and practical feasibility for cross-organizational AI governance.

Technology Category

Application Category

📝 Abstract
Benchmarks are important measures to evaluate safety and compliance of AI models at scale. However, they typically do not offer verifiable results and lack confidentiality for model IP and benchmark datasets. We propose Attestable Audits, which run inside Trusted Execution Environments and enable users to verify interaction with a compliant AI model. Our work protects sensitive data even when model provider and auditor do not trust each other. This addresses verification challenges raised in recent AI governance frameworks. We build a prototype demonstrating feasibility on typical audit benchmarks against Llama-3.1.
Problem

Research questions and friction points this paper is trying to address.

Ensuring verifiable AI safety benchmark results
Protecting model IP and benchmark dataset confidentiality
Enabling trust between model providers and auditors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Trusted Execution Environments for verifiability
Protects sensitive AI model and benchmark data
Enables verification without mutual trust
C
Christoph Schnabl
Department of Computer Science and Technology, University of Cambridge
Daniel Hugenroth
Daniel Hugenroth
Computer security researcher, University of Cambridge
Computer SecurityPrivacyMobile DevicesApplied CryptographyAnonymous Communication
Bill Marino
Bill Marino
PhD student, University of Cambridge
machine learning
A
Alastair R. Beresford
Department of Computer Science and Technology, University of Cambridge