🤖 AI Summary
Addressing the dual challenges of unreachable targets in real-world robotic environments, rigid classical planning, and infeasible/unsafe plans generated by large language models (LLMs), this paper proposes a hierarchical task planning framework that synergistically integrates classical planning with LLMs. Our core contribution is a novel semantic-driven progressive goal relaxation mechanism: leveraging LLM-based commonsense reasoning, it jointly grounds semantic and geometric knowledge in a 3D scene graph, iteratively relaxing the original goal into context-adapted, functionally equivalent, and executable sub-goals. The method unifies PDDL-based symbolic planning, hierarchical task decomposition, and semantic grounding. Evaluated across diverse complex 3D scenarios, it significantly improves task success rates while ensuring safety and feasibility for long-horizon manipulation. Code, datasets, and evaluation benchmarks are publicly released.
📝 Abstract
Classical planning in AI and Robotics addresses complex tasks by shifting from imperative to declarative approaches (e.g., PDDL). However, these methods often fail in real scenarios due to limited robot perception and the need to ground perceptions to planning predicates. This often results in heavily hard-coded behaviors that struggle to adapt, even with scenarios where goals can be achieved through relaxed planning. Meanwhile, Large Language Models (LLMs) lead to planning systems that leverage commonsense reasoning but often at the cost of generating unfeasible and/or unsafe plans. To address these limitations, we present an approach integrating classical planning with LLMs, leveraging their ability to extract commonsense knowledge and ground actions. We propose a hierarchical formulation that enables robots to make unfeasible tasks tractable by defining functionally equivalent goals through gradual relaxation. This mechanism supports partial achievement of the intended objective, suited to the agent's specific context. Our method demonstrates its ability to adapt and execute tasks effectively within environments modeled using 3D Scene Graphs through comprehensive qualitative and quantitative evaluations. We also show how this method succeeds in complex scenarios where other benchmark methods are more likely to fail. Code, dataset, and additional material are released to the community.