Evaluating AI Companies' Frontier Safety Frameworks: Methodology and Results

📅 2025-11-30

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing AI safety frameworks lack empirically grounded, actionable evaluation methods. Method: We develop the first fine-grained, operational assessment framework—grounded in safety-critical industry principles—and evaluate 12 leading AI companies’ state-of-the-art safety frameworks against 65 criteria across four dimensions: risk identification, analysis, mitigation, and governance. Contribution/Results: Empirical evaluation reveals widespread weaknesses in industry practice (scores: 8%–35%), particularly in quantifying risk tolerance, defining development suspension thresholds, and detecting unknown risks. Adopting current best practices would raise average scores to 52%. This work bridges a critical gap between high-level, abstract safety assessments and implementable evaluation, delivering a structured, actionable roadmap for regulatory policy design and enterprise safety capability development.

Technology Category

Application Category

📝 Abstract

Following the Seoul AI Safety Summit in 2024, twelve AI companies published frontier safety frameworks outlining their approaches to managing catastrophic risks from advanced AI systems. These frameworks now serve as a key mechanism for AI risk governance, utilized by regulations and governance instruments such as the EU AI Act's Code of Practice and California's Transparency in Frontier Artificial Intelligence Act. Given their centrality to AI risk management, assessments of such frameworks are warranted. Existing assessments evaluate them at a high level of abstraction and lack granularity on specific practices for companies to adopt. We address this gap by developing a 65-criteria assessment methodology grounded in established risk management principles from safety-critical industries. We evaluate the twelve frameworks across four dimensions: risk identification, risk analysis and evaluation, risk treatment, and risk governance. Companies' current scores are low, ranging from 8% to 35%. By adopting existing best practices already in use across the frameworks, companies could reach 52%. The most critical gaps are nearly universal: companies generally fail to (a) define quantitative risk tolerances, (b) specify capability thresholds for pausing development, and (c) systematically identify unknown risks. To guide improvement, we provide specific recommendations for each company and each criterion.

Problem

Research questions and friction points this paper is trying to address.

Develops a detailed methodology to assess AI safety frameworks

Evaluates twelve companies' risk management across four dimensions

Identifies critical gaps and provides specific improvement recommendations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed 65-criteria assessment method from safety-critical industries

Evaluated frameworks across four risk management dimensions

Identified key gaps in quantitative tolerances and thresholds

🔎 Similar Papers

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?