🤖 AI Summary
This paper addresses non-adversarial physical safety risks in LLM-driven embodied agents for everyday task planning. To this end, we propose Safe-BeAl—a unified framework comprising (1) SafePlan-Bench, the first fine-grained physical safety evaluation benchmark covering eight categories of physical hazards across 2,027 daily tasks; and (2) Safe-Align, a novel safety alignment method that losslessly injects physical safety knowledge into LLM-based planning. Safe-Align integrates hazard-aware environmental modeling, explicit safety constraint injection, and multi-dimensional alignment training. Experiments on models including GPT-4 demonstrate that Safe-BeAl improves task planning safety by 8.55–15.22% while maintaining high task success rates—outperforming existing baselines significantly. Our contributions include: the first comprehensive physical safety benchmark for embodied task planning; the first alignment technique enabling lossless integration of domain-specific safety knowledge into LLM reasoning; and empirical validation of substantial safety gains without compromising functional performance.
📝 Abstract
Large Language Models (LLMs) exhibit substantial promise in enhancing task-planning capabilities within embodied agents due to their advanced reasoning and comprehension. However, the systemic safety of these agents remains an underexplored frontier. In this study, we present Safe-BeAl, an integrated framework for the measurement (SafePlan-Bench) and alignment (Safe-Align) of LLM-based embodied agents' behaviors. SafePlan-Bench establishes a comprehensive benchmark for evaluating task-planning safety, encompassing 2,027 daily tasks and corresponding environments distributed across 8 distinct hazard categories (e.g., Fire Hazard). Our empirical analysis reveals that even in the absence of adversarial inputs or malicious intent, LLM-based agents can exhibit unsafe behaviors. To mitigate these hazards, we propose Safe-Align, a method designed to integrate physical-world safety knowledge into LLM-based embodied agents while maintaining task-specific performance. Experiments across a variety of settings demonstrate that Safe-BeAl provides comprehensive safety validation, improving safety by 8.55 - 15.22%, compared to embodied agents based on GPT-4, while ensuring successful task completion.