Learning Bimanual Manipulation via Action Chunking and Inter-Arm Coordination with Transformers

📅 2025-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autonomous dual-arm robots in domestic environments face significant challenges in inter-arm coordination due to high degrees of freedom, leading to asynchronous motion execution and limited dexterity in fine manipulation. Method: We propose the Inter-Arm Coordinated Transformer Encoder (IACE), an intermediate encoding layer that jointly models action tokenization and cross-arm temporal alignment, enabling end-to-end learning of coordinated dual-arm policies. Built upon a Transformer architecture, IACE integrates action chunking, heterogeneous arm-specific networks, and a dedicated synchronization encoding mechanism. Contribution/Results: Evaluated on diverse real-world dual-arm manipulation tasks, our approach achieves substantial improvements in task success rates. Results demonstrate IACE’s effectiveness, robustness, and generalizability in complex, dynamic collaborative scenarios—outperforming prior methods in both synchronization fidelity and operational dexterity.

Technology Category

Application Category

📝 Abstract
Robots that can operate autonomously in a human living environment are necessary to have the ability to handle various tasks flexibly. One crucial element is coordinated bimanual movements that enable functions that are difficult to perform with one hand alone. In recent years, learning-based models that focus on the possibilities of bimanual movements have been proposed. However, the high degree of freedom of the robot makes it challenging to reason about control, and the left and right robot arms need to adjust their actions depending on the situation, making it difficult to realize more dexterous tasks. To address the issue, we focus on coordination and efficiency between both arms, particularly for synchronized actions. Therefore, we propose a novel imitation learning architecture that predicts cooperative actions. We differentiate the architecture for both arms and add an intermediate encoder layer, Inter-Arm Coordinated transformer Encoder (IACE), that facilitates synchronization and temporal alignment to ensure smooth and coordinated actions. To verify the effectiveness of our architectures, we perform distinctive bimanual tasks. The experimental results showed that our model demonstrated a high success rate for comparison and suggested a suitable architecture for the policy learning of bimanual manipulation.
Problem

Research questions and friction points this paper is trying to address.

Enhancing robot bimanual coordination for complex tasks.
Addressing high degrees of freedom in robot control.
Improving synchronization and efficiency in dual-arm operations.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Transformers for bimanual robot coordination
Introduces Inter-Arm Coordinated transformer Encoder (IACE)
Focuses on synchronized actions for efficient manipulation
🔎 Similar Papers
No similar papers found.
Tomohiro Motoda
Tomohiro Motoda
National Institute of Advanced Industrial Science and Technology (AIST)
Robotic manipulationdeep learning
R
Ryo Hanai
National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
R
Ryoichi Nakajo
National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
Masaki Murooka
Masaki Murooka
National Institute of Advanced Industrial Science and Technology
Robotics
Floris Erich
Floris Erich
National Institute of Advanced Industrial Science and Technology (AIST, Japan)
Software engineeringartificial intelligenceroboticsmachine learning
Y
Y. Domae
National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan