🤖 AI Summary
This work addresses the problem that large language models (LLMs) suffer from reduced executability in task planning due to reliance on intermediate representations (e.g., code or formal logic). We propose an end-to-end native planning method that directly leverages LLMs’ commonsense semantic understanding as a heuristic function for hill-climbing search—bypassing code generation or formal translation entirely—and enabling direct mapping from natural-language task descriptions to executable action sequences. Our approach integrates semantic-driven prompt engineering, native encoding of the action space, and lightweight heuristic search. Evaluated on a household environment planning benchmark, our method improves task success rate by 22 percentage points while ensuring 100% plan executability. It is the first to empirically demonstrate that LLM-generated semantic outputs can reliably guide symbolic search without post-processing, thereby significantly reducing the complexity of the planning stack.
📝 Abstract
While systems designed for solving planning tasks vastly outperform Large Language Models (LLMs) in this domain, they usually discard the rich semantic information embedded within task descriptions. In contrast, LLMs possess parametrised knowledge across a wide range of topics, enabling them to leverage the natural language descriptions of planning tasks in their solutions. However, current research in this direction faces challenges in generating correct and executable plans. Furthermore, these approaches depend on the LLM to output solutions in an intermediate language, which must be translated into the representation language of the planning task. We introduce a novel planning method, which leverages the parametrised knowledge of LLMs by using their output as a heuristic for Hill-Climbing Search. This approach is further enhanced by prompting the LLM to generate a solution estimate to guide the search. Our method outperforms the task success rate of similar systems within a common household environment by 22 percentage points, with consistently executable plans. All actions are encoded in their original representation, demonstrating that strong results can be achieved without an intermediate language, thus eliminating the need for a translation step.