🤖 AI Summary
Large language model (LLM)-driven robotic long-horizon task planning often lacks explicit safety mechanisms, leading to overlooked risks and unsafe execution. Method: This paper proposes SAFER, a safety-aware framework featuring a novel Safety Agent–LLM-as-a-Judge collaborative paradigm for multi-stage real-time risk assessment, proactive error correction, and interpretable verification; it is the first to embed control barrier functions (CBFs) into the LLM planning loop, bridging high-level symbolic reasoning with low-level safety-critical control. Technical contributions include a multi-LLM coordination architecture, quantitative safety evaluation metrics, and a hardware-in-the-loop multi-robot testbed. Results: Evaluated on complex heterogeneous tasks and realistic human–robot coexistence scenarios, SAFER significantly reduces safety violations while maintaining high task completion efficiency, thereby unifying planning safety and execution reliability.
📝 Abstract
The integration of large language models (LLMs) into robotic task planning has unlocked better reasoning capabilities for complex, long-horizon workflows. However, ensuring safety in LLM-driven plans remains a critical challenge, as these models often prioritize task completion over risk mitigation. This paper introduces SAFER (Safety-Aware Framework for Execution in Robotics), a multi-LLM framework designed to embed safety awareness into robotic task planning. SAFER employs a Safety Agent that operates alongside the primary task planner, providing safety feedback. Additionally, we introduce LLM-as-a-Judge, a novel metric leveraging LLMs as evaluators to quantify safety violations within generated task plans. Our framework integrates safety feedback at multiple stages of execution, enabling real-time risk assessment, proactive error correction, and transparent safety evaluation. We also integrate a control framework using Control Barrier Functions (CBFs) to ensure safety guarantees within SAFER's task planning. We evaluated SAFER against state-of-the-art LLM planners on complex long-horizon tasks involving heterogeneous robotic agents, demonstrating its effectiveness in reducing safety violations while maintaining task efficiency. We also verify the task planner and safety planner through actual hardware experiments involving multiple robots and a human.