R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the bottleneck in multi-agent imitation learning—namely, reliance on synchronized joint demonstrations—this paper proposes Round-Robin Behavior Cloning (RR-BC). RR-BC enables a single human operator to collect asynchronous demonstration data by temporally alternating control over agents one at a time, eliminating the need for joint-action space annotation. Built upon the behavior cloning framework, it integrates serialized demonstration modeling with decentralized policy training, enabling cooperative policy learning while respecting individual observation constraints. Evaluated on four simulated tasks, RR-BC matches or surpasses synchronous-demonstration baselines; it further succeeds in two real-robot collaborative tasks, demonstrating strong generalization and practical applicability. The core contribution is the first systematic solution to the challenge of learning from asynchronous, single-operator multi-agent demonstrations—significantly lowering the data collection barrier for multi-agent imitation learning.

Technology Category

Application Category

📝 Abstract
Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi-robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi-agent action space. We show that R2BC methods match, and in some cases surpass, the performance of an oracle behavior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations.
Problem

Research questions and friction points this paper is trying to address.

Extends imitation learning to multi-agent systems using single-agent demonstrations
Enables single human to train robot teams via sequential teleoperation
Achieves performance comparable to privileged synchronized demonstration methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sequentially trains multi-robot systems via single-agent demonstrations
Enables single human operator to teach collaborative robot teams
Avoids requirement for joint multi-agent action space demonstrations
🔎 Similar Papers
No similar papers found.