🤖 AI Summary
Existing learning-based bimanual manipulation methods exhibit limited cross-scene generalization in cluttered environments. This paper proposes a sampling-based, GPU-accelerated optimization framework. Methodologically, it introduces (1) a differentiable cost function explicitly designed for bimanual coordination tasks, and (2) a tightly integrated pipeline combining parallelized MuJoCo physics simulation with a customized Model Predictive Path Integral (MPPI) controller, enabling efficient, robust real-time trajectory planning and robot-object interaction modeling. Evaluated on the high-difficulty PerAct² benchmark, the approach achieves successful sim-to-real transfer, maintaining strong robustness in complex obstacle-rich environments while sustaining millisecond-level inference latency. It significantly improves generalization performance of bimanual systems under unseen scenes, outperforming prior learning-based and optimization-based methods in both adaptability and computational efficiency.
📝 Abstract
In recent years, dual-arm manipulation has become an area of strong interest in robotics, with end-to-end learning emerging as the predominant strategy for solving bimanual tasks. A critical limitation of such learning-based approaches, however, is their difficulty in generalizing to novel scenarios, especially within cluttered environments. This paper presents an alternative paradigm: a sampling-based optimization framework that utilizes a GPU-accelerated physics simulator as its world model. We demonstrate that this approach can solve complex bimanual manipulation tasks in the presence of static obstacles. Our contribution is a customized Model Predictive Path Integral Control (MPPI) algorithm, extbf{guided by carefully designed task-specific cost functions,} that uses GPU-accelerated MuJoCo for efficiently evaluating robot-object interaction. We apply this method to solve significantly more challenging versions of tasks from the PerAct$^{2}$ benchmark, such as requiring the point-to-point transfer of a ball through an obstacle course. Furthermore, we establish that our method achieves real-time performance on commodity GPUs and facilitates successful sim-to-real transfer by leveraging unique features within MuJoCo. The paper concludes with a statistical analysis of the sample complexity and robustness, quantifying the performance of our approach. The project website is available at: https://sites.google.com/view/bimanualakslabunitartu .