RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

In long-horizon vision-language-action (VLA) tasks, existing VLM-based planners rely on manual annotations or heuristic rules for subtask decomposition, leading to distributional mismatch with underlying visual-motor policy training data and degrading performance. To address this, we propose a retrieval-augmented demonstration decomposition method that automatically aligns subtasks with the policy’s visual feature distribution by matching against low-level policy training data—introducing retrieval into hierarchical task decomposition for the first time, without manual annotation or predefined rules. Our approach integrates VLM-based planning, visual feature retrieval, trajectory alignment, and hierarchical decomposition. Evaluated in both simulation and real-world settings, it outperforms state-of-the-art methods, achieving significant gains in robustness and cross-scene generalization.

Technology Category

Application Category

📝 Abstract

To tackle long-horizon tasks, recent hierarchical vision-language-action (VLAs) frameworks employ vision-language model (VLM)-based planners to decompose complex manipulation tasks into simpler sub-tasks that low-level visuomotor policies can easily handle. Typically, the VLM planner is finetuned to learn to decompose a target task. This finetuning requires target task demonstrations segmented into sub-tasks by either human annotation or heuristic rules. However, the heuristic subtasks can deviate significantly from the training data of the visuomotor policy, which degrades task performance. To address these issues, we propose a Retrieval-based Demonstration Decomposer (RDD) that automatically decomposes demonstrations into sub-tasks by aligning the visual features of the decomposed sub-task intervals with those from the training data of the low-level visuomotor policies. Our method outperforms the state-of-the-art sub-task decomposer on both simulation and real-world tasks, demonstrating robustness across diverse settings. Code and more results are available at rdd-neurips.github.io.

Problem

Research questions and friction points this paper is trying to address.

Automates decomposition of demonstrations into sub-tasks

Aligns sub-task intervals with visuomotor policy training data

Enhances performance in long-horizon hierarchical planning tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-based decomposition aligns sub-tasks with policy data

Automatically segments demonstrations using visual feature matching

Improves planner alignment without heuristic or manual annotation

🔎 Similar Papers

Learn2Decompose: Learning Problem Decomposition for Efficient Sequential Multi-object Manipulation Planning

2024-08-13Citations: 0

Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments

2024-07-14arXiv.orgCitations: 1

Long-horizon Embodied Planning with Implicit Logical Inference and Hallucination Mitigation

2024-09-24Citations: 1