🤖 AI Summary
This work addresses the challenges in safety-critical robotic task planning, where classical planners suffer from poor scalability, reinforcement learning exhibits weak generalization, and large language models (LLMs) lack formal safety guarantees. The authors propose SafeGen-LLM, which introduces the first multi-domain benchmark of explicit safety constraints encoded in PDDL3 and employs a two-stage post-training framework. Initially, supervised fine-tuning aligns natural language instructions with formal task specifications; subsequently, a formal verification–based fine-grained reward mechanism combined with curriculum learning guides Group Relative Policy Optimization (GRPO) to refine the policy. Evaluated under unknown safety properties, SafeGen-LLM demonstrates strong generalization across domains, significantly outperforming state-of-the-art closed-source baselines while achieving high safety compliance and cross-domain transferability.
📝 Abstract
Safety-critical task planning in robotic systems remains challenging: classical planners suffer from poor scalability, Reinforcement Learning (RL)-based methods generalize poorly, and base Large Language Models (LLMs) cannot guarantee safety. To address this gap, we propose safety-generalizable large language models, named SafeGen-LLM. SafeGen-LLM can not only enhance the safety satisfaction of task plans but also generalize well to novel safety properties in various domains. We first construct a multi-domain Planning Domain Definition Language 3 (PDDL3) benchmark with explicit safety constraints. Then, we introduce a two-stage post-training framework: Supervised Fine-Tuning (SFT) on a constraint-compliant planning dataset to learn planning syntax and semantics, and Group Relative Policy Optimization (GRPO) guided by fine-grained reward machines derived from formal verification to enforce safety alignment and by curriculum learning to better handle complex tasks. Extensive experiments show that SafeGen-LLM achieves strong safety generalization and outperforms frontier proprietary baselines across multi-domain planning tasks and multiple input formats (e.g., PDDLs and natural language).