🤖 AI Summary
This work systematically evaluates the feasibility of replacing classical planners (e.g., Fast Downward) with large language models (LLMs) for robot task planning. Methodologically, it introduces zero-shot PDDL prompting—feeding domain definition files directly to LLMs across multiple benchmarks—and quantifies plan executability via execution fidelity. Results show that while LLMs achieve moderate success on simple tasks, their performance degrades substantially on complex scenarios, revealing fundamental limitations in maintaining state consistency, modeling resource constraints, and performing precise logical reasoning. The key contributions are: (1) the first cross-benchmark evaluation framework comparing LLMs and classical planners specifically for robot planning; (2) empirical identification of critical weaknesses in LLMs’ structured reasoning capabilities; and (3) proposal of an “LLM + classical planner” hybrid paradigm as a practical evolutionary path toward robust, scalable robotic planning systems.
📝 Abstract
Recent advancements in Large Language Models have sparked interest in their potential for robotic task planning. While these models demonstrate strong generative capabilities, their effectiveness in producing structured and executable plans remains uncertain. This paper presents a systematic evaluation of a broad spectrum of current state of the art language models, each directly prompted using Planning Domain Definition Language domain and problem files, and compares their planning performance with the Fast Downward planner across a variety of benchmarks. In addition to measuring success rates, we assess how faithfully the generated plans translate into sequences of actions that can actually be executed, identifying both strengths and limitations of using these models in this setting. Our findings show that while the models perform well on simpler planning tasks, they continue to struggle with more complex scenarios that require precise resource management, consistent state tracking, and strict constraint compliance. These results underscore fundamental challenges in applying language models to robotic planning in real world environments. By outlining the gaps that emerge during execution, we aim to guide future research toward combined approaches that integrate language models with classical planners in order to enhance the reliability and scalability of planning in autonomous robotics.