π€ AI Summary
Existing task and motion planning (TAMP) approaches suffer from prohibitive computational overhead in long-horizon tasks due to excessive motion sampling; while large language models (LLMs) encode commonsense priors, they lack 3D geometric and dynamical reasoning capabilities. This paper proposes a Vision-Language Model (VLM)-driven TAMP framework that unifies symbolic task states and continuous motion states within a hybrid state tree. Crucially, it tightly couples VLM-based visual reasoning with dynamical validation during searchβvia VLM-guided sampling, interleaved search strategies, joint verification using off-the-shelf motion planners and physics simulators, and visual rendering of intermediate states to refine search direction. Evaluated in simulation and real-world settings, our method improves task success rates by 32.14%β1166.67% over traditional and LLM-based baselines, while substantially reducing planning time. Ablation studies confirm the critical role of VLM guidance in enhancing both efficiency and solution feasibility.
π Abstract
Task and Motion Planning (TAMP) integrates high-level task planning with low-level motion feasibility, but existing methods are costly in long-horizon problems due to excessive motion sampling. While LLMs provide commonsense priors, they lack 3D spatial reasoning and cannot ensure geometric or dynamic feasibility. We propose a kinodynamic TAMP framework based on a hybrid state tree that uniformly represents symbolic and numeric states during planning, enabling task and motion decisions to be jointly decided. Kinodynamic constraints embedded in the TAMP problem are verified by an off-the-shelf motion planner and physics simulator, and a VLM guides exploring a TAMP solution and backtracks the search based on visual rendering of the states. Experiments on the simulated domains and in the real world show 32.14% - 1166.67% increased average success rates compared to traditional and LLM-based TAMP planners and reduced planning time on complex problems, with ablations further highlighting the benefits of VLM guidance.