STEP Planner: Constructing cross-hierarchical subgoal tree as an embodied long-horizon task planner

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Large language models (LLMs) exhibit insufficient reasoning capabilities and low success rates in long-horizon, embodied task planning for real-world robots. Method: We propose a closed-loop hierarchical subgoal planning framework that constructs a cross-level subgoal tree: a base LLM performs coarse-grained task decomposition, while an environment-state-driven leaf-node termination model dynamically assesses subgoal completion and triggers the next-level planning, thereby closing the perception–planning–execution loop. Contribution/Results: Our key innovation lies in decoupling task decomposition from execution termination—enabling adaptive, verifiable hierarchical planning. Evaluated on the VirtualHome WAH-NL benchmark and a physical robot platform, our approach achieves 34% and 25% success rates, respectively, substantially outperforming prior methods.

Technology Category

Application Category

📝 Abstract

The ability to perform reliable long-horizon task planning is crucial for deploying robots in real-world environments. However, directly employing Large Language Models (LLMs) as action sequence generators often results in low success rates due to their limited reasoning ability for long-horizon embodied tasks. In the STEP framework, we construct a subgoal tree through a pair of closed-loop models: a subgoal decomposition model and a leaf node termination model. Within this framework, we develop a hierarchical tree structure that spans from coarse to fine resolutions. The subgoal decomposition model leverages a foundation LLM to break down complex goals into manageable subgoals, thereby spanning the subgoal tree. The leaf node termination model provides real-time feedback based on environmental states, determining when to terminate the tree spanning and ensuring each leaf node can be directly converted into a primitive action. Experiments conducted in both the VirtualHome WAH-NL benchmark and on real robots demonstrate that STEP achieves long-horizon embodied task completion with success rates up to 34% (WAH-NL) and 25% (real robot) outperforming SOTA methods.

Problem

Research questions and friction points this paper is trying to address.

Improving long-horizon task planning for robots using LLMs

Enhancing subgoal decomposition for complex embodied tasks

Increasing success rates in real-world robotic task completion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Subgoal tree construction for long-horizon planning

Hierarchical tree from coarse to fine resolutions

Closed-loop subgoal decomposition and termination models

🔎 Similar Papers

Long-horizon Embodied Planning with Implicit Logical Inference and Hallucination Mitigation