Imitation Learning from a Single Temporally Misaligned Video

📅 2025-02-08

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Existing imitation learning methods for single-video demonstrations model tasks as frame-level distribution matching, which fails to ensure temporal consistency and progress robustness of subgoals due to temporal misalignment caused by execution variability and time-varying dynamics. Method: We propose the Ordered Coverage Alignment (ORCA) paradigm, which abandons explicit temporal alignment and instead models ordered coverage relationships among subgoals. ORCA introduces a dense temporal reward function and a subgoal progression probability model, integrated into an end-to-end reinforcement learning framework. Contribution/Results: ORCA is the first approach to formalize imitation learning as a subgoal ordering matching problem—enabling reliable sequence policy learning without temporal alignment. Evaluated on Meta-World and Humanoid-v4, it achieves average performance improvements of 4.5× and 6.6× over frame-alignment baselines, respectively, and demonstrates strong robustness to highly misaligned demonstrations.

Technology Category

Application Category

📝 Abstract

We examine the problem of learning sequential tasks from a single visual demonstration. A key challenge arises when demonstrations are temporally misaligned due to variations in timing, differences in embodiment, or inconsistencies in execution. Existing approaches treat imitation as a distribution-matching problem, aligning individual frames between the agent and the demonstration. However, we show that such frame-level matching fails to enforce temporal ordering or ensure consistent progress. Our key insight is that matching should instead be defined at the level of sequences. We propose that perfect matching occurs when one sequence successfully covers all the subgoals in the same order as the other sequence. We present ORCA (ORdered Coverage Alignment), a dense per-timestep reward function that measures the probability of the agent covering demonstration frames in the correct order. On temporally misaligned demonstrations, we show that agents trained with the ORCA reward achieve $4.5$x improvement ($0.11 ightarrow 0.50$ average normalized returns) for Meta-world tasks and $6.6$x improvement ($6.55 ightarrow 43.3$ average returns) for Humanoid-v4 tasks compared to the best frame-level matching algorithms. We also provide empirical analysis showing that ORCA is robust to varying levels of temporal misalignment. Our code is available at https://github.com/portal-cornell/orca/

Problem

Research questions and friction points this paper is trying to address.

Learning from misaligned video demonstrations

Ensuring correct temporal sequence in imitation

Improving performance in sequential task learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequence-level matching approach

ORCA reward function introduction

Robust to temporal misalignment

🔎 Similar Papers

No similar papers found.