Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance

📅 2024-05-22

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address the coarse-grained and insufficiently guiding reward functions in robot manipulation skill learning, this paper proposes a fine-grained failure-driven chain-of-reasoning reward modeling method. First, it decomposes tasks into verifiable sub-steps via a “robotic chain-of-thought” (CoT) mechanism. Second, it leverages vision-language models (VLMs) to provide sparse yet precise failure signals at the sub-task level, enabling fine-grained reward construction. Third, it integrates VLM-guided self-imitation learning to accelerate policy optimization. This work is the first to jointly model sub-task-level VLM rewards and self-imitation learning. Evaluated across diverse manipulation tasks, our method achieves an average success rate 5.4× higher than the state-of-the-art baseline RoboCLIP, significantly outperforming mainstream approaches including CLIP and LIV.

Technology Category

Application Category

📝 Abstract

Defining reward functions for skill learning has been a long-standing challenge in robotics. Recently, vision-language models (VLMs) have shown promise in defining reward signals for teaching robots manipulation skills. However, existing work often provides reward guidance that is too coarse, leading to insufficient learning processes. In this paper, we address this issue by implementing more fine-grained reward guidance. We decompose tasks into simpler sub-tasks, using this decomposition to offer more informative reward guidance with VLMs. We also propose a VLM-based self imitation learning process to speed up learning. Empirical evidence demonstrates that our algorithm consistently outperforms baselines such as CLIP, LIV, and RoboCLIP. Specifically, our algorithm achieves a $5.4 imes$ higher average success rates compared to the best baseline, RoboCLIP, across a series of manipulation tasks.

Problem

Research questions and friction points this paper is trying to address.

Fine-grained reward guidance for skill learning

Decomposing tasks into simpler sub-tasks

VLM-based self imitation learning process

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse failure guidance

Fine-grained reward signals

VLM-based self imitation

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey