π€ AI Summary
Multi-robot collaboration typically requires extensive expert demonstrations or joint reward designβboth costly and impractical for real-world deployment. Method: This paper proposes an explicit collaborative learning framework enabled by brief human guidance (40 minutes), featuring: (1) dynamic switching of controlled agents to emulate human role plasticity; (2) attention-based modeling of Theory of Mind (ToM) for inferring teammate intentions; and (3) integration of a hierarchical policy network with real-time role assignment. Contribution/Results: The framework eliminates reliance on multi-agent demonstrations or shared reward signals, achieving the first instance of explicit multi-agent collaboration learning from single-user supervision. Evaluated on a simulated cooperative hide-and-seek task, it improves success rate by 58% over baselines and successfully transfers to a physical multi-robot platform, demonstrating strong generalization and practical applicability.
π Abstract
Learning collaborative behaviors is essential for multi-agent systems. Traditionally, multi-agent reinforcement learning solves this implicitly through a joint reward and centralized observations, assuming collaborative behavior will emerge. Other studies propose to learn from demonstrations of a group of collaborative experts. Instead, we propose an efficient and explicit way of learning collaborative behaviors in multi-agent systems by leveraging expertise from only a single human. Our insight is that humans can naturally take on various roles in a team. We show that agents can effectively learn to collaborate by allowing a human operator to dynamically switch between controlling agents for a short period and incorporating a human-like theory-of-mind model of teammates. Our experiments showed that our method improves the success rate of a challenging collaborative hide-and-seek task by up to 58% with only 40 minutes of human guidance. We further demonstrate our findings transfer to the real world by conducting multi-robot experiments.