A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents

📅 2025-04-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses non-adversarial physical safety risks in LLM-driven embodied agents for everyday task planning. To this end, we propose Safe-BeAl—a unified framework comprising (1) SafePlan-Bench, the first fine-grained physical safety evaluation benchmark covering eight categories of physical hazards across 2,027 daily tasks; and (2) Safe-Align, a novel safety alignment method that losslessly injects physical safety knowledge into LLM-based planning. Safe-Align integrates hazard-aware environmental modeling, explicit safety constraint injection, and multi-dimensional alignment training. Experiments on models including GPT-4 demonstrate that Safe-BeAl improves task planning safety by 8.55–15.22% while maintaining high task success rates—outperforming existing baselines significantly. Our contributions include: the first comprehensive physical safety benchmark for embodied task planning; the first alignment technique enabling lossless integration of domain-specific safety knowledge into LLM reasoning; and empirical validation of substantial safety gains without compromising functional performance.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) exhibit substantial promise in enhancing task-planning capabilities within embodied agents due to their advanced reasoning and comprehension. However, the systemic safety of these agents remains an underexplored frontier. In this study, we present Safe-BeAl, an integrated framework for the measurement (SafePlan-Bench) and alignment (Safe-Align) of LLM-based embodied agents' behaviors. SafePlan-Bench establishes a comprehensive benchmark for evaluating task-planning safety, encompassing 2,027 daily tasks and corresponding environments distributed across 8 distinct hazard categories (e.g., Fire Hazard). Our empirical analysis reveals that even in the absence of adversarial inputs or malicious intent, LLM-based agents can exhibit unsafe behaviors. To mitigate these hazards, we propose Safe-Align, a method designed to integrate physical-world safety knowledge into LLM-based embodied agents while maintaining task-specific performance. Experiments across a variety of settings demonstrate that Safe-BeAl provides comprehensive safety validation, improving safety by 8.55 - 15.22%, compared to embodied agents based on GPT-4, while ensuring successful task completion.
Problem

Research questions and friction points this paper is trying to address.

Measure safety in LLM-based embodied agents
Align agents with physical-world safety knowledge
Benchmark task-planning safety across hazard categories
Innovation

Methods, ideas, or system contributions that make the work stand out.

SafePlan-Bench benchmarks 2,027 daily tasks safety
Safe-Align integrates safety knowledge into LLMs
Improves safety by 8.55-15.22% over GPT-4 agents
🔎 Similar Papers
No similar papers found.
Y
Yuting Huang
University of Science and Technology of China
L
Leilei Ding
University of Science and Technology of China
Zhipeng Tang
Zhipeng Tang
UMass Amherst
T
Tianfu Wang
University of Science and Technology of China
Xinrui Lin
Xinrui Lin
Universtiy of Science and Technology of China
Cognitive Robotics、Answer Set Programming、3D Computer Vision
W
Wuyang Zhang
University of Science and Technology of China
M
Mingxiao Ma
University of Science and Technology of China
Yanyong Zhang
Yanyong Zhang
University of Science and Technology of China ; Rutgers University (Adjunct Visiting Professor)
SensingCyber-Physical SystemsMulti-Modal PerceptionEfficient AI Systems