🤖 AI Summary
This work addresses the challenge of enabling dual-arm robots to achieve high-precision collaborative manipulation from a single human demonstration—without requiring task-specific prior knowledge, object models, or additional data collection or training. We propose a three-stage visual servoing (3-VS) alignment method and a novel dual-arm coordination paradigm, achieving, for the first time, end-to-end few-shot generalization of dual-arm skills. Our approach integrates visual servoing guidance, kinematic decoupling, and synchronized control to enable both trajectory replay and real-time inter-arm coordination. We deploy the method on a physical dual-arm platform to execute six everyday tasks with 4–6 degrees of freedom—including insertion, assembly, and grasp-and-pass—demonstrating strong robustness against distractors and partial occlusions. Experimental results show state-of-the-art localization accuracy and task success rates.
📝 Abstract
We introduce One-Shot Dual-Arm Imitation Learning (ODIL), which enables dual-arm robots to learn precise and coordinated everyday tasks from just a single demonstration of the task. ODIL uses a new three-stage visual servoing (3-VS) method for precise alignment between the end-effector and target object, after which replay of the demonstration trajectory is sufficient to perform the task. This is achieved without requiring prior task or object knowledge, or additional data collection and training following the single demonstration. Furthermore, we propose a new dual-arm coordination paradigm for learning dual-arm tasks from a single demonstration. ODIL was tested on a real-world dual-arm robot, demonstrating state-of-the-art performance across six precise and coordinated tasks in both 4-DoF and 6-DoF settings, and showing robustness in the presence of distractor objects and partial occlusions. Videos are available at: https://www.robot-learning.uk/one-shot-dual-arm.