🤖 AI Summary
Large language models (LLMs) exhibit limited long-horizon planning capability without external feedback, particularly suffering from low resource efficiency and subhuman plan quality. To address this, we propose AoT+, an extension of the Algorithm-of-Thoughts framework that integrates structured reasoning modeling and step-wise self-validation, coupled with implicit state representation and multi-step backtracking constraints. AoT+ enables fully internal, tool-free, and re-prompting-free autonomous planning—requiring no external environment interaction or human intervention. For the first time, it empowers models such as GPT-4 to surpass average human performance autonomously on classical planning benchmarks (e.g., Blocksworld), achieving state-of-the-art accuracy and executable plan validity. Our core contribution is the establishment of the first LLM reasoning paradigm capable of stably generating high-quality, long-sequence operational plans without any external feedback.
📝 Abstract
Large language models (LLMs) have demonstrated significant capabilities in natural language processing and reasoning, yet their effectiveness in autonomous planning has been under debate. While existing studies have utilized LLMs with external feedback mechanisms or in controlled environments for planning, these approaches often involve substantial computational and development resources due to the requirement for careful design and iterative backprompting. Moreover, even the most advanced LLMs like GPT-4 struggle to match human performance on standard planning benchmarks, such as the Blocksworld, without additional support. This paper investigates whether LLMs can independently generate long-horizon plans that rival human baselines. Our novel enhancements to Algorithm-of-Thoughts (AoT), which we dub AoT+, help achieve state-of-the-art results in planning benchmarks out-competing prior methods and human baselines all autonomously.