🤖 AI Summary
Addressing the challenge of dynamically evolving human partners’ behavior, intent, and cognition in long-term human-robot collaboration, this paper proposes MICoBot—a novel framework that introduces hybrid active dialogue to collaborative manipulation for the first time, enabling bidirectional request initiation and multi-turn task allocation negotiation between human and robot. MICoBot integrates a large language model (LLM), a simulation-pretrained operability model, human assistance availability estimation, and a hierarchical planning mechanism to jointly realize three-tier decision-making: collaborative strategy generation, task allocation optimization, and action execution control. A 27-hour user study (N=18) conducted on a physical robot platform demonstrates that MICoBot achieves a 23.6% higher task success rate compared to an LLM-only baseline and other allocation methods, while significantly improving user satisfaction. These results empirically validate the critical role of dynamic negotiation in enhancing collaboration flexibility and robustness.
📝 Abstract
Effective robotic systems for long-horizon human-robot collaboration must adapt to a wide range of human partners, whose physical behavior, willingness to assist, and understanding of the robot's capabilities may change over time. This demands a tightly coupled communication loop that grants both agents the flexibility to propose, accept, or decline requests as they coordinate toward completing the task effectively. We apply a Mixed-Initiative dialog paradigm to Collaborative human-roBot teaming and propose MICoBot, a system that handles the common scenario where both agents, using natural language, take initiative in formulating, accepting, or rejecting proposals on who can best complete different steps of a task. To handle diverse, task-directed dialog, and find successful collaborative strategies that minimize human effort, MICoBot makes decisions at three levels: (1) a meta-planner considers human dialog to formulate and code a high-level collaboration strategy, (2) a planner optimally allocates the remaining steps to either agent based on the robot's capabilities (measured by a simulation-pretrained affordance model) and the human's estimated availability to help, and (3) an action executor decides the low-level actions to perform or words to say to the human. Our extensive evaluations in simulation and real-world -- on a physical robot with 18 unique human participants over 27 hours -- demonstrate the ability of our method to effectively collaborate with diverse human users, yielding significantly improved task success and user experience than a pure LLM baseline and other agent allocation models. See additional videos and materials at https://robin-lab.cs.utexas.edu/MicoBot/.