AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual-language models (VLMs) deployed in embodied agents pose significant safety risks when executing hazardous instructions, yet no standardized benchmark exists for evaluating such risks in embodied settings. Method: We propose the first safety evaluation benchmark for embodied agents executing dangerous instructions. Inspired by Asimov’s Three Laws of Robotics, we construct a risk-aware instruction dataset covering both primitive hazardous commands and jailbreak variants. We design an entity-action alignment adapter to bridge high-level semantic planning with low-level executable actions. Evaluation is conducted systematically across perception–planning–execution stages within an embodied simulation sandbox. Contribution: We introduce a standardized benchmark comprising 45 adversarial scenarios, 1,350 tasks, and 8,100 hazardous instructions. This is the first framework enabling end-to-end, quantitative safety assessment of VLM-driven embodied agents under dangerous instruction conditions.

Technology Category

Application Category

📝 Abstract
The rapid advancement of vision-language models (VLMs) and their integration into embodied agents have unlocked powerful capabilities for decision-making. However, as these systems are increasingly deployed in real-world environments, they face mounting safety concerns, particularly when responding to hazardous instructions. In this work, we propose AGENTSAFE, the first comprehensive benchmark for evaluating the safety of embodied VLM agents under hazardous instructions. AGENTSAFE simulates realistic agent-environment interactions within a simulation sandbox and incorporates a novel adapter module that bridges the gap between high-level VLM outputs and low-level embodied controls. Specifically, it maps recognized visual entities to manipulable objects and translates abstract planning into executable atomic actions in the environment. Building on this, we construct a risk-aware instruction dataset inspired by Asimovs Three Laws of Robotics, including base risky instructions and mutated jailbroken instructions. The benchmark includes 45 adversarial scenarios, 1,350 hazardous tasks, and 8,100 hazardous instructions, enabling systematic testing under adversarial conditions ranging from perception, planning, and action execution stages.
Problem

Research questions and friction points this paper is trying to address.

Evaluating safety of embodied VLM agents under hazardous instructions
Bridging gap between VLM outputs and low-level embodied controls
Systematic testing with adversarial scenarios and hazardous tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulates agent-environment interactions in sandbox
Novel adapter module bridges VLM-embodied controls
Risk-aware dataset based on Asimov's Three Laws
🔎 Similar Papers
No similar papers found.
A
Aishan Liu
Beihang University, China
Zonghao Ying
Zonghao Ying
SKLCCSE, BUAA
Trustworthy AI
L
Le Wang
Beihang University, China
J
Junjie Mu
Politecnico di Milano, Italy
Jinyang Guo
Jinyang Guo
The University of Sydney
Deep LearningEfficient MethodsEdge Computing
Jiakai Wang
Jiakai Wang
Zhongguancun Laboratory
Adversarial examplesTrustworthy AI
Y
Yuqing Ma
Beihang University, China
Siyuan Liang
Siyuan Liang
College of Computing and Data Science, Nanyang Technological University
Trustworthy Foundation Model
M
Mingchuan Zhang
Henan University of Science and Technology, China
X
Xianglong Liu
Beihang University, China, Zhongguancun Laboratory, China
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining