AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation

πŸ“… 2024-12-09
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address data scarcity and poor transferability in bimanual robotic manipulation, this paper proposes AnyBimanualβ€”a framework enabling efficient adaptation from pretrained unimanual policies to general language-conditioned bimanual control. The method introduces three key innovations: (1) a plug-and-play skill manager that dynamically orchestrates unimanual skills and incorporates task-oriented compensation modeling; (2) a soft-masked visual aligner that mitigates observation discrepancies between unimanual and bimanual settings; and (3) a fine-tuning mechanism leveraging linear combinations of skill representations and few-shot bimanual demonstrations. Evaluated on 12 simulated tasks from RLBench2, AnyBimanual achieves a 12.67% absolute success rate improvement over baselines. On nine real-world bimanual tasks, it attains an average success rate of 84.62%. The approach significantly reduces both data requirements and deployment overhead for learning bimanual policies.

Technology Category

Application Category

πŸ“ Abstract
Performing general language-conditioned bimanual manipulation tasks is of great importance for many applications ranging from household service to industrial assembly. However, collecting bimanual manipulation data is expensive due to the high-dimensional action space, which poses challenges for conventional methods to handle general bimanual manipulation tasks. In contrast, unimanual policy has recently demonstrated impressive generalizability across a wide range of tasks because of scaled model parameters and training data, which can provide sharable manipulation knowledge for bimanual systems. To this end, we propose a plug-and-play method named AnyBimanual, which transfers pre-trained unimanual policy to general bimanual manipulation policy with few bimanual demonstrations. Specifically, we first introduce a skill manager to dynamically schedule the skill representations discovered from pre-trained unimanual policy for bimanual manipulation tasks, which linearly combines skill primitives with task-oriented compensation to represent the bimanual manipulation instruction. To mitigate the observation discrepancy between unimanual and bimanual systems, we present a visual aligner to generate soft masks for visual embedding of the workspace, which aims to align visual input of unimanual policy model for each arm with those during pretraining stage. AnyBimanual shows superiority on 12 simulated tasks from RLBench2 with a sizable 12.67% improvement in success rate over previous methods. Experiments on 9 real-world tasks further verify its practicality with an average success rate of 84.62%.
Problem

Research questions and friction points this paper is trying to address.

Transfer unimanual policy to bimanual manipulation tasks
Address high-dimensional action space in bimanual data collection
Align visual input for unimanual and bimanual systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfers unimanual policy to bimanual tasks
Uses skill manager for dynamic scheduling
Aligns visual input with soft masks
πŸ”Ž Similar Papers
No similar papers found.
Guanxing Lu
Guanxing Lu
Tsinghua University
VLARLRobotics3D Vision
Tengbo Yu
Tengbo Yu
Tsinghua University
VLAComputer VisionEmbodied AI
Haoyuan Deng
Haoyuan Deng
Nanyang Technological University
RoboticsImitation LearningReinforcement Learning
S
Season Si Chen
Tsinghua Shenzhen International Graduate School, Tsinghua University
Y
Yansong Tang
Tsinghua Shenzhen International Graduate School, Tsinghua University
Z
Ziwei Wang
School of Electrical and Electronic Engineering, Nanyang Technological University