Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the severe risks posed by frontier artificial intelligence models in domains such as cybersecurity, manipulative persuasion, strategic deception, uncontrolled development, and self-replication. The work systematically evaluates these five risk dimensions by innovatively constructing high-risk scenarios—including inter-LLM persuasion, agent “mal-evolution,” and resource-constrained self-replication—and integrates risk modeling, adversarial experimentation, behavioral monitoring, and safety assessment to propose and validate several actionable mitigation mechanisms. Through this approach, the research achieves fine-grained quantitative analysis of critical risks, thereby establishing a technical foundation and empirical support for the safe deployment and governance of advanced AI systems.

Technology Category

Application Category

📝 Abstract
To understand and identify the unprecedented risks posed by rapidly advancing artificial intelligence (AI) models, Frontier AI Risk Management Framework in Practice presents a comprehensive assessment of their frontier risks. As Large Language Models (LLMs) general capabilities rapidly evolve and the proliferation of agentic AI, this version of the risk analysis technical report presents an updated and granular assessment of five critical dimensions: cyber offense, persuasion and manipulation, strategic deception, uncontrolled AI R\&D, and self-replication. Specifically, we introduce more complex scenarios for cyber offense. For persuasion and manipulation, we evaluate the risk of LLM-to-LLM persuasion on newly released LLMs. For strategic deception and scheming, we add the new experiment with respect to emergent misalignment. For uncontrolled AI R\&D, we focus on the ``mis-evolution''of agents as they autonomously expand their memory substrates and toolsets. Besides, we also monitor and evaluate the safety performance of OpenClaw during the interaction on the Moltbook. For self-replication, we introduce a new resource-constrained scenario. More importantly, we propose and validate a series of robust mitigation strategies to address these emerging threats, providing a preliminary technical and actionable pathway for the secure deployment of frontier AI. This work reflects our current understanding of AI frontier risks and urges collective action to mitigate these challenges.
Problem

Research questions and friction points this paper is trying to address.

frontier AI
risk management
large language models
agentic AI
emergent risks
Innovation

Methods, ideas, or system contributions that make the work stand out.

frontier AI risk
emergent misalignment
LLM-to-LLM persuasion
mis-evolution
resource-constrained self-replication
🔎 Similar Papers
No similar papers found.
D
Dongrui Liu
Shanghai AI Laboratory
Y
Yi Yu
Shanghai AI Laboratory
Jie Zhang
Jie Zhang
Unknown affiliation
Guanxu Chen
Guanxu Chen
Shanghai Jiao Tong University
Trustworthy AIInterpretability
Q
Qihao Lin
Shanghai AI Laboratory
H
Hanxi Zhu
Shanghai AI Laboratory
L
Lige Huang
Shanghai AI Laboratory
Y
Yijin Zhou
Shanghai AI Laboratory
P
Peng Wang
Shanghai AI Laboratory
S
Shuai Shao
Shanghai AI Laboratory
B
Boxuan Zhang
Shanghai AI Laboratory
Z
Zicheng Liu
Shanghai AI Laboratory
J
Jingwei Sun
Shanghai AI Laboratory
Y
Yu Li
Shanghai AI Laboratory
Yuejin Xie
Yuejin Xie
Huazhong University of Science and Technology
LLM SafetyTrustworthy AI
J
Jiaxuan Guo
Shanghai AI Laboratory
J
Jia Xu
Shanghai AI Laboratory
Chaochao Lu
Chaochao Lu
Shanghai AI Laboratory
Causal AI
B
Bowen Zhou
Shanghai AI Laboratory
Xia Hu
Xia Hu
Google DeepMind
Deep LearningMachine LearningMultimodal
Jing Shao
Jing Shao
Research Scientist, Shanghai AI Laboratory/Shanghai Jiao Tong University
Computer VisionMulti-Modal Large Language Model