🤖 AI Summary
Coordinating parallel execution and sequential collaboration remains challenging in long-horizon, high-contact-complexity bimanual manipulation tasks.
Method: This paper proposes a hierarchical planning and scheduling framework: (i) a low-level reinforcement learning module learns a library of fundamental single- and bimanual skills; (ii) a mid-level Transformer architecture models skill composition sequences to jointly predict skill cascades and their parameters; and (iii) a high-level coordinator enables hybrid control—supporting both parallel skill invocation and synchronized collaborative execution.
Contribution/Results: The framework breaks from conventional sequential decision-making paradigms, significantly improving task success rates and behavioral coordination in long-horizon, high-contact scenarios. Empirical evaluation demonstrates superior robustness and generalization compared to end-to-end reinforcement learning and classical sequential planners.
📝 Abstract
Long-horizon contact-rich bimanual manipulation presents a significant challenge, requiring complex coordination involving a mixture of parallel execution and sequential collaboration between arms. In this paper, we introduce a hierarchical framework that frames this challenge as an integrated skill planning & scheduling problem, going beyond purely sequential decision-making to support simultaneous skill invocation. Our approach is built upon a library of single-arm and bimanual primitive skills, each trained using Reinforcement Learning (RL) in GPU-accelerated simulation. We then train a Transformer-based planner on a dataset of skill compositions to act as a high-level scheduler, simultaneously predicting the discrete schedule of skills as well as their continuous parameters. We demonstrate that our method achieves higher success rates on complex, contact-rich tasks than end-to-end RL approaches and produces more efficient, coordinated behaviors than traditional sequential-only planners.