🤖 AI Summary
This work addresses multi-object relocalization under movable obstacles, formulated as a geometric Task and Motion Planning (g-TAMP) problem. We propose an LLM-guided search framework: geometric scenes are encoded into predicate-based prompts compatible with large language models (LLMs), enabling the generation of an initial task plan to warm-start Monte Carlo Tree Search (MCTS)—thereby avoiding costly per-node LLM calls. To our knowledge, this is the first approach to employ LLMs for search guidance—rather than end-to-end action generation—in g-TAMP, establishing the “LLM-warm-started MCTS” paradigm. The method achieves superior robustness without sacrificing inference efficiency. Evaluated on six canonical g-TAMP benchmark tasks, it significantly outperforms both traditional search-based planners and state-of-the-art LLM-based planners in success rate and planning efficiency. The source code is publicly available.
📝 Abstract
The problem of relocating a set of objects to designated areas amidst movable obstacles can be framed as a Geometric Task and Motion Planning (
g-tamp
), a subclass of task and motion planning problem (TAMP). Traditional approaches to
g-tamp
have relied either on domain-independent heuristics or on learning from planning experience to guide the search, both of which typically demand significant computational resources or data. In contrast, humans often use common sense to intuitively decide which objects to manipulate in
g-tamp
problems. Inspired by this, we propose leveraging Large Language Models (LLMs), which have common sense knowledge acquired from internet-scale data, to guide task planning in
g-tamp
problems. To enable LLMs to perform geometric reasoning, we design a predicate-based prompt that encodes geometric information derived from a motion planning algorithm. We then query the LLM to generate a task plan, which is then used to search for a feasible set of continuous parameters. Since LLM is prone to mistakes, instead of committing to LLM’s outputs we extend Monte Carlo Tree Search (MCTS) to a hybrid action space and use the LLM to guide the search. Unlike the previous approach that calls an LLM at every node and incurs high computational costs, we use it to warm-start the MCTS with the nodes explored in completing the LLM’s task plan. On six different
g-tamp
problems, we show our method outperforms previous LLM planners and pure search algorithms. Code can be found at
https://github.com/iMSquared/prime-the-search
.