🤖 AI Summary
Large language models (LLMs) suffer from low reliability in long-horizon, complex robotic task planning due to insufficient external knowledge integration and lack of formal verification. Method: This paper proposes a neuro-symbolic hierarchical planning framework integrating three core components: (1) knowledge graph-enhanced retrieval-augmented generation (KG-RAG), (2) multi-level task decomposition, and (3) a first-order logic-based symbolic verifier for real-time world-state consistency checking—enabling stepwise mapping from high-level tasks to executable atomic actions. Contribution/Results: To our knowledge, this is the first work to deeply integrate all three components and introduce an interpretable symbolic failure detection mechanism. Evaluated on diverse complex robotic tasks, our framework significantly outperforms baselines, improving planning accuracy by over 37%. Furthermore, we propose novel evaluation metrics that effectively quantify LLMs’ compositional reasoning capability.
📝 Abstract
Large Language Models (LLMs) have shown promise as robotic planners but often struggle with long-horizon and complex tasks, especially in specialized environments requiring external knowledge. While hierarchical planning and Retrieval-Augmented Generation (RAG) address some of these challenges, they remain insufficient on their own and a deeper integration is required for achieving more reliable systems. To this end, we propose a neuro-symbolic approach that enhances LLMs-based planners with Knowledge Graph-based RAG for hierarchical plan generation. This method decomposes complex tasks into manageable subtasks, further expanded into executable atomic action sequences. To ensure formal correctness and proper decomposition, we integrate a Symbolic Validator, which also functions as a failure detector by aligning expected and observed world states. Our evaluation against baseline methods demonstrates the consistent significant advantages of integrating hierarchical planning, symbolic verification, and RAG across tasks of varying complexity and different LLMs. Additionally, our experimental setup and novel metrics not only validate our approach for complex planning but also serve as a tool for assessing LLMs' reasoning and compositional capabilities.