BiKC+: Bimanual Hierarchical Imitation With Keypose-Conditioned Coordination-Aware Consistency Policies

📅 2026-01-17

🏛️ IEEE Transactions on Automation Science and Engineering

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the challenges of coordination complexity and low inference efficiency in multi-stage collaborative manipulation for dual-arm robots. The authors propose a hierarchical imitation learning framework that integrates a high-level key-pose predictor with a low-level trajectory generator. By leveraging task semantics and robot motion features, the framework identifies key poses as subgoals to guide the coordinated generation of bimanual actions. A novel consistency policy conditioned on key poses enables single-step inference, preserving the multi-stage task structure while significantly accelerating execution. Experimental results demonstrate that the proposed method outperforms existing baselines in both simulation and real-world environments, achieving substantial improvements in task success rate and operational efficiency.

Technology Category

Application Category

📝 Abstract

Robots are essential in industrial manufacturing due to their reliability and efficiency. They excel in performing simple and repetitive unimanual tasks but still face challenges with bimanual manipulation. This difficulty arises from the complexities of coordinating dual arms and handling multi-stage processes. Recent integration of generative models into imitation learning (IL) has made progress in tackling specific challenges. However, few approaches explicitly consider the multi-stage nature of bimanual tasks while also emphasizing the importance of inference speed. In multi-stage tasks, failures or delays at any stage can cascade over time, impacting the success and efficiency of subsequent sub-stages and ultimately hindering overall task performance. In this paper, we propose a novel keypose-conditioned coordination-aware consistency policy tailored for bimanual manipulation. Our framework instantiates hierarchical imitation learning with a high-level keypose predictor and a low-level trajectory generator. The predicted keyposes serve as sub-goals for trajectory generation, indicating targets for individual sub-stages. The trajectory generator is formulated as a consistency model, generating action sequences based on historical observations and predicted keyposes in a single inference step. In particular, we devise an innovative approach for identifying bimanual keyposes, considering both robot-centric action features and task-centric operation styles. Simulation and real-world experiments illustrate that our approach significantly outperforms baseline methods in terms of success rates and operational efficiency. Note to Practitioners—Bimanual manipulation typically involves multiple stages that require efficient interactions between two arms, presenting both step-wise and stage-wise challenges for imitation learning (IL) systems. Existing approaches, particularly those based on generative models, have not explicitly considered the impact of these multiple stages, which negatively affects overall success rates and operational efficiency. This paper proposes a novel hierarchical IL framework, BiKC+, which learns from distributionally multi-modal demonstrations, generates actions through one-step inference, and addresses bimanual multi-stage manipulation. The high-level keypose predictor forecasts the next target keypose in joint space, serving as a guidance for low-level actions and an indicator for sub-stage completion. This enhances per-stage reliability and improves overall success rates. The low-level trajectory predictor is formulated as a consistency model that generates action sequences in a single inference step, thereby increasing inference speed and enhancing operational efficiency. Comprehensive experiments indicate the potential of the proposed framework to be applied in real-world manufacturing and industrial settings. Future research will integrate additional sensory information, such as forces and torques, to accurately reproduce subtle motions arising from contact and force interactions, thereby strengthening the approach for fine-grained manipulation.

Problem

Research questions and friction points this paper is trying to address.

bimanual manipulation

multi-stage tasks

coordination

imitation learning

inference speed

Innovation

Methods, ideas, or system contributions that make the work stand out.

bimanual manipulation

hierarchical imitation learning

keypose-conditioned policy