🤖 AI Summary
Regulatory oversight of high-risk AI systems suffers from insufficient explainability and a lack of standardized benchmarking, leading to accountability gaps and eroded stakeholder trust.
Method: This study establishes a tiered explainability framework and benchmark evaluation system, introducing the first use-case-specific AI compliance certification mechanism for anticipated high-risk applications. Drawing on regulatory precedents from the U.S. FDA and the EU AI Act, it proposes a dedicated AI regulatory authority and mandates standardized internal auditing, fairness-aware statistical assessment, and input impact analysis. We develop automated auditing tools, a structured benchmarking framework, a compliance certificate generation system, and a public AI registry.
Contribution/Results: The work delivers an actionable, risk-based governance blueprint that significantly enhances regulatory efficiency and user trust. It provides critical technical infrastructure and operational templates for international regulatory frameworks—including the EU AI Act—enabling scalable, evidence-based AI oversight.
📝 Abstract
We propose establishing an office to oversee AI systems by introducing a tiered system of explainability and benchmarking requirements for commercial AI systems. We examine how complex high-risk technologies have been successfully regulated at the national level. Specifically, we draw parallels to the existing regulation for the U.S. medical device industry and the pharmaceutical industry (regulated by the FDA), the proposed legislation for AI in the European Union (the AI Act), and the existing U.S. anti-discrimination legislation. To promote accountability and user trust, AI accountability mechanisms shall introduce standarized measures for each category of intended high-risk use of AI systems to enable structured comparisons among such AI systems. We suggest using explainable AI techniques, such as input influence measures, as well as fairness statistics and other performance measures of high-risk AI systems. We propose to standardize internal benchmarking and automated audits to transparently characterize high-risk AI systems. The results of such audits and benchmarks shall be clearly and transparently communicated and explained to enable meaningful comparisons of competing AI systems via a public AI registry. Such standardized audits, benchmarks, and certificates shall be specific to intended high-risk use of respective AI systems and could constitute conformity assessment for AI systems, e.g., in the European Union's AI Act.