π€ AI Summary
Vision-motion diffusion policies struggle with out-of-distribution (OOD) states in real-world settings and lack mechanisms for real-time human intervention. Method: This paper introduces the Real-Time Operator Takeover (RTOT) paradigm, featuring (1) a novel online anomaly detection mechanism based on Mahalanobis distance for precise OOD state identification, and (2) a takeover-demonstration-driven incremental policy training framework that directly incorporates human takeover actions into the policy optimization loop. Contribution/Results: RTOT significantly improves policy generalization and robustness. Under equivalent data budgets, it achieves substantially higher task success rates than conventional long-horizon imitation learning. Real-robot rice-scooping experiments validate its capabilities in critical failure point detection, sub-second recovery, and seamless humanβrobot collaborative control.
π Abstract
We present a Real-Time Operator Takeover (RTOT) paradigm enabling operators to seamlessly take control of a live visuomotor diffusion policy, guiding the system back into desirable states or reinforcing specific demonstrations. We presents new insights in using the Mahalonobis distance to automaicaly identify undesirable states. Once the operator has intervened and redirected the system, the control is seamlessly returned to the policy, which resumes generating actions until further intervention is required. We demonstrate that incorporating the targeted takeover demonstrations significantly improves policy performance compared to training solely with an equivalent number of, but longer, initial demonstrations. We provide an in-depth analysis of using the Mahalanobis distance to detect out-of-distribution states, illustrating its utility for identifying critical failure points during execution. Supporting materials, including videos of initial and takeover demonstrations and all rice-scooping experiments, are available on the project website: https://operator-takeover.github.io/