🤖 AI Summary
Standard behavioral cloning struggles in teleoperation with contact-rich or dynamic tasks because it overlooks the intent–execution discrepancy arising from human operators compensating for hardware limitations such as time delay, friction, and lack of force feedback. This work proposes a dual-state conditional framework that shifts the learning objective from mimicking the robot’s executed trajectory to cloning the operator’s master-side commands—termed “intent cloning.” Crucially, it treats the intent–execution discrepancy not as noise but as a meaningful signal encoding implicit interaction forces and system dynamics, thereby enabling sensorless impedance control and online system identification without force sensors. By integrating trajectory rectification and historical discrepancy modeling, the method significantly outperforms conventional behavioral cloning on a low-cost, sensor-free bimanual teleoperation platform, demonstrating robust performance in high-stiffness contact and dynamic tracking tasks.
📝 Abstract
Teleoperation inherently relies on the human operator acting as a closed-loop controller to actively compensate for hardware imperfections, including latency, mechanical friction, and lack of explicit force feedback. Standard Behavior Cloning (BC), by mimicking the robot's executed trajectory, fundamentally ignores this compensatory mechanism. In this work, we propose a Dual-State Conditioning framework that shifts the learning objective to "Intent Cloning" (master command). We posit that the Intent-Execution Mismatch, the discrepancy between master command and slave response, is not noise, but a critical signal that physically encodes implicit interaction forces and algorithmically reveals the operator's strategy for overcoming system dynamics. By predicting the master intent, our policy learns to generate a "virtual equilibrium point", effectively realizing implicit impedance control. Furthermore, by explicitly conditioning on the history of this mismatch, the model performs implicit system identification, perceiving tracking errors as external forces to close the control loop. To bridge the temporal gap caused by inference latency, we further formulate the policy as a trajectory inpainter to ensure continuous control. We validate our approach on a sensorless, low-cost bi-manual setup. Empirical results across tasks requiring contact-rich manipulation and dynamic tracking reveal a decisive gap: while standard execution-cloning fails due to the inability to overcome contact stiffness and tracking lag, our mismatch-aware approach achieves robust success. This presents a minimalist behavior cloning framework for low-cost hardware, enabling force perception and dynamic compensation without relying on explicit force sensing. Videos are available on the \href{https://xucj98.github.io/mind-the-gap-page/}{project page}.