🤖 AI Summary
This work addresses the challenge of enabling high-dimensional bimanual coordination skills to achieve online, zero-shot adaptation to novel tasks during deployment, particularly under natural language feedback from human users. To this end, we propose BiSAIL, a hierarchical “reasoning–modulation” framework: at the high level, it fuses linguistic instructions with visual context to infer task-specific adaptation goals; at the low level, a diffusion model modulates bimanual actions in real time. BiSAIL is the first approach that empowers non-expert users to flexibly customize robot behaviors through spoken commands without any additional training. Extensive evaluations across six bimanual tasks and two robotic platforms demonstrate that BiSAIL significantly outperforms existing methods in human-in-the-loop adaptability, task generalization, and cross-platform scalability.
📝 Abstract
Developing general-purpose robots capable of autonomously operating in human living environments requires the ability to adapt to continuously evolving task conditions. However, adapting high-dimensional coordinated bimanual skills to novel task variations at deployment remains a fundamental challenge. In this work, we present BiSAIL (Bimanual Skill Adaptation via Interactive Language), a novel framework that enables zero-shot online adaptation of offline-learned bimanual skills through interactive language feedback. The key idea of BiSAIL is to adopt a hierarchical reason-then-modulate paradigm, which first infers generalized adaptation objectives from multimodal task variations, and then adapts bimanual motions via diffusion modulation to achieve the inferred objectives. Extensive real-robot experiments across six bimanual tasks and two dual-arm platforms demonstrate that BiSAIL significantly outperforms existing methods in human-in-the-loop adaptability, task generalization and cross-embodiment scalability. This work enables the development of adaptive bimanual assistants that can be flexibly customized by non-expert users via intuitive verbal corrections. Experimental videos and code are available at https://rip4kobe.github.io/BiSAIL/.