From Solo to Symphony: Orchestrating Multi-Agent Collaboration with Single-Agent Demos

📅 2025-11-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of expensive multi-agent demonstration data in multi-agent reinforcement learning (MARL), this paper proposes SoCo, the first framework to systematically leverage readily available single-agent demonstrations for improving collaborative learning efficiency. SoCo adopts a two-stage offline RL design: (1) pretraining a shared policy on single-agent demonstrations, and (2) transferring knowledge from independent to cooperative behavior via a Mixture-of-Experts (MoE) gating selector and a differentiable action editor for policy fusion. Crucially, SoCo operates entirely in the offline setting without requiring any multi-agent interaction data. Evaluated on multiple collaborative benchmarks, SoCo achieves significant improvements in both training efficiency and final performance. These results empirically validate the effectiveness, generalizability, and scalability of single-agent demonstrations for MARL—demonstrating that high-quality collaboration can emerge from individual behavioral priors without explicit multi-agent supervision.

Technology Category

Application Category

📝 Abstract
Training a team of agents from scratch in multi-agent reinforcement learning (MARL) is highly inefficient, much like asking beginners to play a symphony together without first practicing solo. Existing methods, such as offline or transferable MARL, can ease this burden, but they still rely on costly multi-agent data, which often becomes the bottleneck. In contrast, solo experiences are far easier to obtain in many important scenarios, e.g., collaborative coding, household cooperation, and search-and-rescue. To unlock their potential, we propose Solo-to-Collaborative RL (SoCo), a framework that transfers solo knowledge into cooperative learning. SoCo first pretrains a shared solo policy from solo demonstrations, then adapts it for cooperation during multi-agent training through a policy fusion mechanism that combines an MoE-like gating selector and an action editor. Experiments across diverse cooperative tasks show that SoCo significantly boosts the training efficiency and performance of backbone algorithms. These results demonstrate that solo demonstrations provide a scalable and effective complement to multi-agent data, making cooperative learning more practical and broadly applicable.
Problem

Research questions and friction points this paper is trying to address.

Improving multi-agent training efficiency using single-agent demonstrations
Reducing reliance on costly multi-agent data in cooperative learning
Transferring solo knowledge to enhance collaborative task performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pretrains shared solo policy from single-agent demonstrations
Adapts solo policy for cooperation via policy fusion
Combines MoE-like gating selector with action editor
🔎 Similar Papers
No similar papers found.
X
Xun Wang
Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University
Z
Zhuoran Li
Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University
Y
Yanshan Lin
Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University
H
Hai Zhong
Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University
Longbo Huang
Longbo Huang
Professor, IIIS, Tsinghua University, ACM Distinguished Scientist
Reinforcement Learning (RL)Deep RLMachine LearningStochastic NetworksPerformance Evaluation