🤖 AI Summary
To address the challenge of ensuring strict adherence to user-specified semantic constraints in long-horizon natural language tasks for LLM-driven robots, this paper proposes a safety-enhanced LLM planning framework. Methodologically: (1) it introduces the first NL-to-LTL equivalence voting mechanism, leveraging multi-sample semantic equivalence grouping to improve robustness in logical modeling; (2) it designs autoregressive constraint decoding guided by LTL formulas to enforce real-time semantic compliance during token generation; and (3) it performs safety-aware domain fine-tuning on robotic tasks to strengthen modeling of physical constraints and operational safety. Experiments demonstrate significant improvements: in UAV navigation, safety rate increases by 10.8% and planning efficiency by 19.8%; in robotic manipulation, safety rate improves by 20.4%, with verified cross-platform generalizability.
📝 Abstract
Despite significant advancements in large language models (LLMs) that enhance robot agents' understanding and execution of natural language (NL) commands, ensuring the agents adhere to user-specified constraints remains challenging, particularly for complex commands and long-horizon tasks. To address this challenge, we present three key insights, equivalence voting, constrained decoding, and domain-specific fine-tuning, which significantly enhance LLM planners' capability in handling complex tasks. Equivalence voting ensures consistency by generating and sampling multiple Linear Temporal Logic (LTL) formulas from NL commands, grouping equivalent LTL formulas, and selecting the majority group of formulas as the final LTL formula. Constrained decoding then uses the generated LTL formula to enforce the autoregressive inference of plans, ensuring the generated plans conform to the LTL. Domain-specific fine-tuning customizes LLMs to produce safe and efficient plans within specific task domains. Our approach, Safe Efficient LLM Planner (SELP), combines these insights to create LLM planners to generate plans adhering to user commands with high confidence. We demonstrate the effectiveness and generalizability of SELP across different robot agents and tasks, including drone navigation and robot manipulation. For drone navigation tasks, SELP outperforms state-of-the-art planners by 10.8% in safety rate (i.e., finishing tasks conforming to NL commands) and by 19.8% in plan efficiency. For robot manipulation tasks, SELP achieves 20.4% improvement in safety rate. Our datasets for evaluating NL-to-LTL and robot task planning will be released in github.com/lt-asset/selp.