One-Shot Imitation under Mismatched Execution

📅 2024-09-10

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address the misalignment in one-shot imitation learning between humans and robots—caused by disparities in motion style and physical capabilities—this paper proposes RHyME, a framework that enables automatic execution alignment via optimal transport cost minimization. RHyME generalizes long-horizon human demonstrations into robot-executable policies without requiring paired data. Methodologically, it leverages cross-modal video retrieval and synthesis to identify and recompose semantically equivalent segments from a short-horizon human video library; integrates semantic alignment representation learning with unpaired imitation learning to establish style-invariant action mapping. Evaluated in both simulation and real-world experiments involving human hand demonstrations, RHyME achieves over 50% higher task success rates compared to state-of-the-art methods, demonstrating robust generalization across diverse demonstrators and heterogeneous robotic platforms.

Technology Category

Application Category

📝 Abstract

Human demonstrations as prompts are a powerful way to program robots to do long-horizon manipulation tasks. However, translating these demonstrations into robot-executable actions presents significant challenges due to execution mismatches in movement styles and physical capabilities. Existing methods either depend on human-robot paired data, which is infeasible to scale, or rely heavily on frame-level visual similarities that often break down in practice. To address these challenges, we propose RHyME, a novel framework that automatically aligns human and robot task executions using optimal transport costs. Given long-horizon robot demonstrations, RHyME synthesizes semantically equivalent human videos by retrieving and composing short-horizon human clips. This approach facilitates effective policy training without the need for paired data. RHyME successfully imitates a range of cross-embodiment demonstrators, both in simulation and with a real human hand, achieving over 50% increase in task success compared to previous methods. We release our code and datasets at https://portal-cornell.github.io/rhyme/.

Problem

Research questions and friction points this paper is trying to address.

Addresses execution mismatches in human-robot task demonstrations.

Eliminates need for human-robot paired data in policy training.

Improves task success rates through cross-embodiment imitation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses optimal transport for human-robot alignment

Synthesizes human videos from robot demonstrations

Enhances task success without paired data

🔎 Similar Papers

No similar papers found.