Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance

๐Ÿ“… 2024-05-22
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the coarse-grained and insufficiently guiding reward functions in robot manipulation skill learning, this paper proposes a fine-grained failure-driven chain-of-reasoning reward modeling method. First, it decomposes tasks into verifiable sub-steps via a โ€œrobotic chain-of-thoughtโ€ (CoT) mechanism. Second, it leverages vision-language models (VLMs) to provide sparse yet precise failure signals at the sub-task level, enabling fine-grained reward construction. Third, it integrates VLM-guided self-imitation learning to accelerate policy optimization. This work is the first to jointly model sub-task-level VLM rewards and self-imitation learning. Evaluated across diverse manipulation tasks, our method achieves an average success rate 5.4ร— higher than the state-of-the-art baseline RoboCLIP, significantly outperforming mainstream approaches including CLIP and LIV.

Technology Category

Application Category

๐Ÿ“ Abstract
Defining reward functions for skill learning has been a long-standing challenge in robotics. Recently, vision-language models (VLMs) have shown promise in defining reward signals for teaching robots manipulation skills. However, existing work often provides reward guidance that is too coarse, leading to insufficient learning processes. In this paper, we address this issue by implementing more fine-grained reward guidance. We decompose tasks into simpler sub-tasks, using this decomposition to offer more informative reward guidance with VLMs. We also propose a VLM-based self imitation learning process to speed up learning. Empirical evidence demonstrates that our algorithm consistently outperforms baselines such as CLIP, LIV, and RoboCLIP. Specifically, our algorithm achieves a $5.4 imes$ higher average success rates compared to the best baseline, RoboCLIP, across a series of manipulation tasks.
Problem

Research questions and friction points this paper is trying to address.

Fine-grained reward guidance for skill learning
Decomposing tasks into simpler sub-tasks
VLM-based self imitation learning process
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse failure guidance
Fine-grained reward signals
VLM-based self imitation