🤖 AI Summary
This work addresses the scarcity of real-world data in bimanual robotic policy learning by introducing a novel collaboration-swapping paradigm. The approach leverages human–single-arm robot collaboration to perform one side of a bimanual task, then swaps roles to collect interactive demonstrations from both agents. By integrating hand pose estimation, RGB-D perception, image inpainting, and kinematic modeling, the method synthesizes demonstrations tailored to a target bimanual robot for imitation learning. This framework enables, for the first time, efficient transfer from human–single-arm collaborative data to unseen bimanual robot configurations, supporting both zero-shot deployment and few-shot fine-tuning. Evaluated on five bimanual tasks, the method achieves up to 70% success in zero-shot settings and, with only 25 target-domain demonstrations, improves performance by 65–70% over training from scratch after fine-tuning.
📝 Abstract
Bimanual coordination is essential for many real-world manipulation tasks, yet learning bimanual robot policies is limited by the scarcity of bimanual robots and datasets. Single-arm robots, however, are widely available in research labs. Can we leverage them to train bimanual robot policies? We present MonoDuo, a framework for learning bimanual manipulation policies using single-arm robot demonstrations paired with human collaboration. MonoDuo collects data by teleoperating a single-arm robot to perform one side of a bimanual task while a human performs the other, then swapping roles to cover both sides. RGB-D observations from a wrist-mounted and fixed camera are augmented into synthetic demonstrations for target bimanual robots using state-of-the-art hand pose estimation, image and point cloud segmentation, and inpainting. These synthetic demonstrations, grounded in real robot kinematics, are used to train bimanual policies. We evaluate MonoDuo on five tasks: box lifting, backpack packing, cloth folding, jacket zipping, and plate handover. Compared to approaches relying solely on human bimanual videos, MonoDuo enables zero-shot deployment on unseen bimanual robot configurations, achieving success rates up to 70%. With only 25 target robot demonstrations, few-shot finetuning further boosts success rates by 65-70% over training from scratch, demonstrating MonoDuo's effectiveness in efficiently transferring knowledge from single-arm robot data to bimanual robot policies.