Rewind-IL: Online Failure Detection and State Respawning for Imitation Learning

📅 2026-04-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
This work addresses the unrecoverable failures in long-horizon chunked imitation learning caused by execution drift during deployment by proposing a training-free online safeguarding framework. The approach uniquely integrates zero-shot failure detection based on Temporal Inconsistency-based Drift Estimation (TIDE) with a vision-language model–guided semantic safe-state rollback mechanism. It leverages conformal prediction–calibrated TIDE detectors, a checkpoint feature bank built from a frozen policy encoder, and a state-reset policy restart procedure to achieve fault tolerance without additional training. Experiments demonstrate that the framework substantially enhances policy robustness in both real-world and simulated long-horizon manipulation tasks and successfully generalizes to flow-matching–based chunked action policies, confirming its versatility and practical applicability.

Technology Category

Application Category

📝 Abstract
Imitation learning has enabled robots to acquire complex visuomotor manipulation skills from demonstrations, but deployment failures remain a major obstacle, especially for long-horizon action-chunked policies. Once execution drifts off the demonstration manifold, these policies often continue producing locally plausible actions without recovering from the failure. Existing runtime monitors either require failure data, over-trigger under benign feature drift, or stop at failure detection without providing a recovery mechanism. We present Rewind-IL, a training-free online safeguard framework for generative action-chunked imitation policies. Rewind-IL combines a zero-shot failure detector based on Temporal Inter-chunk Discrepancy Estimate (TIDE), calibrated with split conformal prediction, with a state-respawning mechanism that returns the robot to a semantically verified safe intermediate state. Offline, a vision-language model identifies recovery checkpoints in demonstrations, and the frozen policy encoder is used to construct a compact checkpoint feature database. Online, Rewind-IL monitors self-consistency in overlapping action chunks, tracks similarity to the checkpoint library, and, upon failure, rewinds execution to the latest verified safe state before restarting inference from a clean policy state. Experiments on real-world and simulated long-horizon manipulation tasks, including transfer to flow-matching action-chunked policies, demonstrate that policy-internal consistency coupled with semantically grounded respawning offers a practical route to improved reliability in imitation learning. Supplemental materials are available at https://sjay05.github.io/rewind-il
Problem

Research questions and friction points this paper is trying to address.

Imitation Learning
Failure Detection
State Respawning
Action-chunked Policies
Long-horizon Manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rewind-IL
failure detection
state respawning
action-chunked imitation learning
temporal inter-chunk discrepancy
🔎 Similar Papers
No similar papers found.
G
Gehan Zheng
College of Connected Computing, Vanderbilt University, USA.
S
Sanjay Seenivasan
College of Connected Computing, Vanderbilt University, USA.; School of Computer Science, University of Waterloo, Canada.
Matthew Johnson-Roberson
Matthew Johnson-Roberson
Professor of Robotics, Carnegie Mellon University
RoboticsField RoboticsAutonomous VehiclesMarine Robotics
W
Weiming Zhi
College of Connected Computing, Vanderbilt University, USA.; School of Computer Science, The University of Sydney, Australia.; Australian Centre for Robotics, The University of Sydney, Australia.