International AI Safety Report 2025: Second Key Update: Technical Safeguards and Risk Management

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses two core challenges in advanced artificial general intelligence (AGI): the risk of malicious misuse—particularly in dual-use domains such as biological weapons development—and insufficient system reliability. Methodologically, it proposes an integrated framework coupling technical safeguards with adaptive governance, incorporating adversarial training, high-fidelity data curation, real-time monitoring systems, and multi-tiered risk assessment models, alongside an institutional-level safety governance architecture. Key contributions include: (1) advancing standardization of AGI safety frameworks; (2) enabling deployment of enhanced protective measures by three leading AI developers; (3) doubling the number of enterprise-level AI safety frameworks published industry-wide; and (4) informing national AI governance policies across multiple jurisdictions—centered on transparency mandates and dynamic risk evaluation. Collectively, these outcomes mark a paradigm shift in AGI risk management: from isolated technical countermeasures toward systemic, institutionalized, and globally coordinated governance.

Technology Category

Application Category

📝 Abstract
This second update to the 2025 International AI Safety Report assesses new developments in general-purpose AI risk management over the past year. It examines how researchers, public institutions, and AI developers are approaching risk management for general-purpose AI. In recent months, for example, three leading AI developers applied enhanced safeguards to their new models, as their internal pre-deployment testing could not rule out the possibility that these models could be misused to help create biological weapons. Beyond specific precautionary measures, there have been a range of other advances in techniques for making AI models and systems more reliable and resistant to misuse. These include new approaches in adversarial training, data curation, and monitoring systems. In parallel, institutional frameworks that operationalise and formalise these technical capabilities are starting to emerge: the number of companies publishing Frontier AI Safety Frameworks more than doubled in 2025, and governments and international organisations have established a small number of governance frameworks for general-purpose AI, focusing largely on transparency and risk assessment.
Problem

Research questions and friction points this paper is trying to address.

Assessing new developments in general-purpose AI risk management approaches
Examining safeguards against AI misuse for biological weapons creation
Evaluating technical advances and institutional frameworks for AI safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced safeguards applied to new AI models
Adversarial training and data curation techniques
Monitoring systems and Frontier AI Safety Frameworks
Yoshua Bengio
Yoshua Bengio
Professor of computer science, University of Montreal, Mila, IVADO, CIFAR
Machine learningdeep learningartificial intelligence
S
Stephen Clare
Carina Prunkl
Carina Prunkl
Ethics Institute, Utrecht University
Ethics of AIGovernance of AIPhilosophy of Science and TechnologyPhilosophy of Physics
Maksym Andriushchenko
Maksym Andriushchenko
ELLIS Institute Tübingen & Max Planck Institute for Intelligent Systems
AI SafetyAI AlignmentLLMsLLM agents
Ben Bucknall
Ben Bucknall
DPhil Student, University of Oxford
P
Philip Fox
Nestor Maslej
Nestor Maslej
Stanford University, The Stanford Institute for Human-Centered Artificial Intelligence
Artificial Intelligence
C
Conor McGlynn
M
Malcolm Murray
Shalaleh Rismani
Shalaleh Rismani
Postdoctoral researcher
AI Ethicsrobot ethicshuman computer interactionhuman robot interaction
Stephen Casper
Stephen Casper
PhD student, MIT
AI safetyAI responsibilityred-teamingrobustnessauditing
J
Jessica Newman
D
Daniel Privitera
Sören Mindermann
Sören Mindermann
University of Oxford, OATML
AI safetydeep learningactive learningcausal inferenceCOVID-19
Daron Acemoglu
Daron Acemoglu
Economics, MIT
T
Thomas G. Dietterich
Fredrik Heintz
Fredrik Heintz
Professor of Computer Science, Linköping University
Artificial intelligenceTrustworthy AIautonomous systemsmulti agent systemscomputational thinking
Geoffrey Hinton
Geoffrey Hinton
Emeritus Prof. Computer Science, University of Toronto
machine learningpsychologyartificial intelligencecognitive sciencecomputer science
Nick Jennings
Nick Jennings
Vice-Chancellor and President, Loughborough University
AIArtificial IntelligenceMulti-Agent SystemsIntelligent Agentsmultiagent systems
Susan Leavy
Susan Leavy
University College Dublin, Insight Centre for Data Analytics
AI EthicsArtificial IntelligenceNatural Lanugage ProcessingAlgorithmic BiasDigital Humanities
T
Teresa Ludermir
V
Vidushi Marda
Helen Margetts
Helen Margetts
Professor of Society and the Internet, University of Oxford
Political SciencePublic PolicyCollective ActionDigital GovernmentPublic Administration
John McDermid
John McDermid
University of York
Safety engineeringAssuring robotics and autonomous systemsSoftware engineeringAI
J
Jane Munga