๐ค AI Summary
This work addresses the challenges of long-horizon, contact-intensive robotic manipulation, where partial observability and contact uncertainty often lead to unstable subtask transitions and poor coordination. To this end, the authors propose a bilateral controlโbased multimodal hierarchical imitation learning framework that, for the first time, integrates subtask progress modeling with a keyframe memory mechanism to dynamically condition both high- and low-level policies. This integration enhances contact-awareness and long-term task coordination. Experimental results demonstrate that the proposed approach significantly outperforms flat policies and ablated variants on both single-arm and dual-arm real-world robotic tasks, exhibiting superior robustness and effectiveness in complex contact-rich scenarios.
๐ Abstract
Long-horizon contact-rich robotic manipulation remains challenging due to partial observability and unstable subtask transitions under contact uncertainty. While hierarchical architectures improve temporal reasoning and bilateral imitation learning enables force-aware control, existing approaches often rely on flat policies that struggle with long-horizon coordination. We propose Bi-HIL, a bilateral control-based multimodal hierarchical imitation learning framework for long-horizon manipulation. Bi-HIL stabilizes hierarchical coordination by integrating keyframe memory with subtask-level progress rate that models phase progression within the active subtask and conditions both high- and low-level policies. We evaluate Bi-HIL on unimanual and bimanual real-robot tasks, demonstrating consistent improvements over flat and ablated variants. The results highlight the importance of explicitly modeling subtask progression together with force-aware control for robust long-horizon manipulation. For additional material, please check: https://mertcookimg.github.io/bi-hil