Multi-Agent Collaboration via Evolving Orchestration

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the poor scalability of monolithic large language models (LLMs) on complex tasks and the low efficiency of multi-agent collaboration, this paper proposes the “Puppeteer” paradigm: a centralized orchestrator trained via reinforcement learning dynamically schedules heterogeneous agents in a task-state-driven, adaptive manner. Our key contributions are twofold: (1) we introduce the first evolvable dynamic orchestration mechanism, overcoming the limitations of static agent topologies; and (2) we empirically discover that compact, cyclic collective reasoning structures naturally emerge during orchestrator training—a phenomenon previously unobserved. Experiments demonstrate consistent and significant improvements over baselines in both closed- and open-source settings: task completion rates and reasoning efficiency increase concurrently, computational overhead is substantially reduced, and reasoning paths become markedly more compact.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving. While recent research explores multi-agent collaboration among LLMs, most approaches rely on static organizational structures that struggle to adapt as task complexity and agent numbers grow, resulting in coordination overhead and inefficiencies. To this end, we propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a centralized orchestrator ("puppeteer") dynamically directs agents ("puppets") in response to evolving task states. This orchestrator is trained via reinforcement learning to adaptively sequence and prioritize agents, enabling flexible and evolvable collective reasoning. Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs. Analyses further reveal that the key improvements consistently stem from the emergence of more compact, cyclic reasoning structures under the orchestrator's evolution.
Problem

Research questions and friction points this paper is trying to address.

Monolithic LLMs limit scalability in complex tasks
Static multi-agent structures lack adaptability and efficiency
Dynamic orchestration needed for evolving task states
Innovation

Methods, ideas, or system contributions that make the work stand out.

Puppeteer-style dynamic agent orchestration
Reinforcement learning-trained adaptive sequencing
Compact cyclic reasoning structures emergence
🔎 Similar Papers
No similar papers found.
Yufan Dang
Yufan Dang
Tsinghua University
Natural Language ProcessingMachine LearningArtificial Intelligence
C
Cheng Qian
Shanghai Jiao Tong University
X
Xueheng Luo
Tsinghua University
J
Jingru Fan
Tsinghua University
Z
Zihao Xie
Tsinghua University
R
Ruijie Shi
Tsinghua University
Weize Chen
Weize Chen
Tsinghua University
NLPML
C
Cheng Yang
Beijing University of Posts and Telecommunications
Xiaoyin Che
Xiaoyin Che
Senior Key Expert Research Scientist @ Siemens Ltd. China
Document AnalysisNatural Language ProcessingE-LearningKnowledge Management
Y
Ye Tian
Tencent Robotics X
X
Xuantang Xiong
Tencent Robotics X
L
Lei Han
Tencent Robotics X
Z
Zhiyuan Liu
Tsinghua University
Maosong Sun
Maosong Sun
Professor of Computer Science and Technology, Tsinghua University
Natural Language ProcessingArtificial IntelligenceSocial Computing