A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents

📅 2025-04-20

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This paper addresses non-adversarial physical safety risks in LLM-driven embodied agents for everyday task planning. To this end, we propose Safe-BeAl—a unified framework comprising (1) SafePlan-Bench, the first fine-grained physical safety evaluation benchmark covering eight categories of physical hazards across 2,027 daily tasks; and (2) Safe-Align, a novel safety alignment method that losslessly injects physical safety knowledge into LLM-based planning. Safe-Align integrates hazard-aware environmental modeling, explicit safety constraint injection, and multi-dimensional alignment training. Experiments on models including GPT-4 demonstrate that Safe-BeAl improves task planning safety by 8.55–15.22% while maintaining high task success rates—outperforming existing baselines significantly. Our contributions include: the first comprehensive physical safety benchmark for embodied task planning; the first alignment technique enabling lossless integration of domain-specific safety knowledge into LLM reasoning; and empirical validation of substantial safety gains without compromising functional performance.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) exhibit substantial promise in enhancing task-planning capabilities within embodied agents due to their advanced reasoning and comprehension. However, the systemic safety of these agents remains an underexplored frontier. In this study, we present Safe-BeAl, an integrated framework for the measurement (SafePlan-Bench) and alignment (Safe-Align) of LLM-based embodied agents' behaviors. SafePlan-Bench establishes a comprehensive benchmark for evaluating task-planning safety, encompassing 2,027 daily tasks and corresponding environments distributed across 8 distinct hazard categories (e.g., Fire Hazard). Our empirical analysis reveals that even in the absence of adversarial inputs or malicious intent, LLM-based agents can exhibit unsafe behaviors. To mitigate these hazards, we propose Safe-Align, a method designed to integrate physical-world safety knowledge into LLM-based embodied agents while maintaining task-specific performance. Experiments across a variety of settings demonstrate that Safe-BeAl provides comprehensive safety validation, improving safety by 8.55 - 15.22%, compared to embodied agents based on GPT-4, while ensuring successful task completion.

Problem

Research questions and friction points this paper is trying to address.

Measure safety in LLM-based embodied agents

Align agents with physical-world safety knowledge

Benchmark task-planning safety across hazard categories

Innovation

Methods, ideas, or system contributions that make the work stand out.

SafePlan-Bench benchmarks 2,027 daily tasks safety

Safe-Align integrates safety knowledge into LLMs

Improves safety by 8.55-15.22% over GPT-4 agents

🔎 Similar Papers

No similar papers found.

Microsoft

$119,800 -

San Francisco Bay area / New York City metropolitan area

AI Research Scientist - Safety Alignment Team