Evaluating the Critical Risks of Amazon's Nova Premier under the Frontier Model Safety Framework

📅 2025-07-07
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This study assesses the safety of Amazon’s Nova Premier multimodal large language model in three high-risk domains—chemical/biological/radiological/nuclear (CBRN) threats, offensive cyber operations, and automated AI research—to determine compliance with safety release thresholds. Method: Guided by the Frontier Model Safety Framework (FMSF), we conduct the first comprehensive, multimodal evaluation across text, image, and video modalities—including million-token context windows—integrating automated benchmarking, expert-led red-teaming, and uplift causal analysis. Contribution/Results: Nova Premier does not exceed established safety thresholds and satisfies commitments made at the 2025 Paris AI Safety Summit, rendering it suitable for public deployment. Furthermore, we introduce a reusable, rigorous methodology for evaluating high-risk multimodal capabilities—advancing the state of the art in large model safety assessment and establishing a novel paradigm for multimodal risk evaluation.

Technology Category

Application Category

📝 Abstract
Nova Premier is Amazon's most capable multimodal foundation model and teacher for model distillation. It processes text, images, and video with a one-million-token context window, enabling analysis of large codebases, 400-page documents, and 90-minute videos in a single prompt. We present the first comprehensive evaluation of Nova Premier's critical risk profile under the Frontier Model Safety Framework. Evaluations target three high-risk domains -- Chemical, Biological, Radiological & Nuclear (CBRN), Offensive Cyber Operations, and Automated AI R&D -- and combine automated benchmarks, expert red-teaming, and uplift studies to determine whether the model exceeds release thresholds. We summarize our methodology and report core findings. Based on this evaluation, we find that Nova Premier is safe for public release as per our commitments made at the 2025 Paris AI Safety Summit. We will continue to enhance our safety evaluation and mitigation pipelines as new risks and capabilities associated with frontier models are identified.
Problem

Research questions and friction points this paper is trying to address.

Evaluating Nova Premier's risks under safety framework
Assessing risks in CBRN, cyber operations, AI R&D
Determining if Nova Premier meets release thresholds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal foundation model with million-token context
Comprehensive risk evaluation using expert red-teaming
Safety assessment under Frontier Model Safety Framework
🔎 Similar Papers
No similar papers found.
Satyapriya Krishna
Satyapriya Krishna
Harvard University
Trustworthy AILarge Language ModelsExplainable & Fair ML
Ninareh Mehrabi
Ninareh Mehrabi
Amazon
AI SafetyResponsible AI
A
Abhinav Mohanty
Amazon Nova Responsible AI
M
Matteo Memelli
Amazon Nova Responsible AI
V
Vincent Ponzo
Amazon Nova Responsible AI
P
Payal Motwani
Amazon Nova Responsible AI
R
Rahul Gupta
Amazon Nova Responsible AI