One-Shot Imitation under Mismatched Execution

📅 2024-09-10
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the misalignment in one-shot imitation learning between humans and robots—caused by disparities in motion style and physical capabilities—this paper proposes RHyME, a framework that enables automatic execution alignment via optimal transport cost minimization. RHyME generalizes long-horizon human demonstrations into robot-executable policies without requiring paired data. Methodologically, it leverages cross-modal video retrieval and synthesis to identify and recompose semantically equivalent segments from a short-horizon human video library; integrates semantic alignment representation learning with unpaired imitation learning to establish style-invariant action mapping. Evaluated in both simulation and real-world experiments involving human hand demonstrations, RHyME achieves over 50% higher task success rates compared to state-of-the-art methods, demonstrating robust generalization across diverse demonstrators and heterogeneous robotic platforms.

Technology Category

Application Category

📝 Abstract
Human demonstrations as prompts are a powerful way to program robots to do long-horizon manipulation tasks. However, translating these demonstrations into robot-executable actions presents significant challenges due to execution mismatches in movement styles and physical capabilities. Existing methods either depend on human-robot paired data, which is infeasible to scale, or rely heavily on frame-level visual similarities that often break down in practice. To address these challenges, we propose RHyME, a novel framework that automatically aligns human and robot task executions using optimal transport costs. Given long-horizon robot demonstrations, RHyME synthesizes semantically equivalent human videos by retrieving and composing short-horizon human clips. This approach facilitates effective policy training without the need for paired data. RHyME successfully imitates a range of cross-embodiment demonstrators, both in simulation and with a real human hand, achieving over 50% increase in task success compared to previous methods. We release our code and datasets at https://portal-cornell.github.io/rhyme/.
Problem

Research questions and friction points this paper is trying to address.

Addresses execution mismatches in human-robot task demonstrations.
Eliminates need for human-robot paired data in policy training.
Improves task success rates through cross-embodiment imitation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses optimal transport for human-robot alignment
Synthesizes human videos from robot demonstrations
Enhances task success without paired data
🔎 Similar Papers
No similar papers found.