STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports

πŸ“… 2025-08-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current AI model evaluations in chemical and biological (ChemBio) safety suffer from opaque reporting and a lack of standardized disclosure practices. Method: This paper introduces the first transparent reporting standard specifically for ChemBio risk assessment of AI models. Drawing on best practices from government, academia, and industry, we develop a structured reporting framework, a standardized evaluation metadata schema, a concise three-page operational report template, and multiple β€œgold-standard” exemplar reports. Contribution/Results: We systematically define, for the first time, the disclosure dimensions and quality requirements for ChemBio safety evaluations; significantly improve the completeness and reproducibility of assessment information; and enable third-party independent auditing and cross-model comparability. The standard has been adopted by multiple leading AI research organizations, enhancing both public trust and methodological rigor in ChemBio safety assessment.

Technology Category

Application Category

πŸ“ Abstract
Evaluations of dangerous AI capabilities are important for managing catastrophic risks. Public transparency into these evaluations - including what they test, how they are conducted, and how their results inform decisions - is crucial for building trust in AI development. We propose STREAM (A Standard for Transparently Reporting Evaluations in AI Model Reports), a standard to improve how model reports disclose evaluation results, initially focusing on chemical and biological (ChemBio) benchmarks. Developed in consultation with 23 experts across government, civil society, academia, and frontier AI companies, this standard is designed to (1) be a practical resource to help AI developers present evaluation results more clearly, and (2) help third parties identify whether model reports provide sufficient detail to assess the rigor of the ChemBio evaluations. We concretely demonstrate our proposed best practices with "gold standard" examples, and also provide a three-page reporting template to enable AI developers to implement our recommendations more easily.
Problem

Research questions and friction points this paper is trying to address.

Standardizing transparent AI evaluation reporting for trust building
Improving disclosure of ChemBio benchmark results in model reports
Providing practical tools for clear AI evaluation communication
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standard for transparent AI evaluation reporting
Focus on chemical and biological benchmarks
Includes practical examples and reporting template
πŸ”Ž Similar Papers
No similar papers found.