π€ AI Summary
This study addresses the limited responsiveness of large language models (LLMs) to human intervention in multi-agent collaborative tasks and their difficulty in achieving task consensusβi.e., alignment of shared grounding. To this end, we propose Interruptible Collaborative Role-playing (ICR), a novel algorithm that models collaborative dynamics as a two-player modification-action Markov decision process and integrates reinforcement learning with human feedback (RLHF) to enable partner-aware collaborative optimization for the first time. ICR empowers LLMs to proactively detect, internalize, and adapt to real-time human interventions, significantly accelerating consensus convergence and improving its quality across multi-turn interactions. Experimental results demonstrate that ICR outperforms baseline methods across diverse collaborative tasks, enhancing both solution-space diversity and task consistency. This work establishes a new paradigm for developing trustworthy, human-aligned, and collaboratively capable LLM agents.
π Abstract
Large Language Models (LLMs) are increasingly bring deployed in agentic settings where they act as collaborators with humans. Therefore, it is increasingly important to be able to evaluate their abilities to collaborate effectively in multi-turn, multi-party tasks. In this paper, we build on the AI alignment and safe interruptability literature to offer novel theoretical insights on collaborative behavior between LLM-driven collaborator agents and an intervention agent. Our goal is to learn an ideal partner-aware collaborator that increases the group's common-ground (CG)-alignment on task-relevant propositions-by intelligently collecting information provided in interventions by a partner agent.We show how LLM agents trained using standard RLHF and related approaches are naturally inclined to ignore possibly well-meaning interventions, which makes increasing group common ground non-trivial in this setting. We employ a two-player Modified-Action MDP to examine this suboptimal behavior of standard AI agents, and propose Interruptible Collaborative Roleplayer (ICR)-a novel partner-aware learning algorithm to train CG-optimal collaborators. Experiments on multiple collaborative task environments show that ICR, on average, is more capable of promoting successful CG convergence and exploring more diverse solutions in such tasks.