Small Model as Master Orchestrator: Learning Unified Agent-Tool Orchestration with Parallel Subtask Decomposition

📅 2026-04-18

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing multi-agent collaborative systems are hindered by static workflows, sequential scheduling, and heterogeneous interfaces, leading to high complexity and poor scalability. This work proposes Agent-as-Tool, a unified paradigm that abstracts both agents and tools into a standardized, learnable action space, and introduces ParaManager—a lightweight coordinator enabling state-aware parallel subtask decomposition, delegation, and asynchronous execution. By unifying communication protocols and incorporating explicit state feedback, the framework facilitates efficient multi-agent collaboration. A two-stage training strategy—combining supervised fine-tuning with a recovery mechanism and reinforcement learning—optimizes task success rate, protocol compliance, response diversity, and reasoning efficiency. Experiments demonstrate that ParaManager achieves strong performance across multiple benchmarks and exhibits robust generalization to unseen agent pools.

Technology Category

Application Category

📝 Abstract

Multi-agent systems (MAS) demonstrate clear advantages in tackling complex problems by coordinating diverse agents and external tools. However, most existing orchestration methods rely on static workflows or serial agent scheduling, and are further constrained by heterogeneous interface protocols between tools and agents. This leads to high system complexity and poor extensibility. To mitigate these issues, we propose Agent-as-Tool, a unified parallel orchestration paradigm that abstracts both agents and tools into a standardized, learnable action space with protocol normalization and explicit state feedback. Building on this paradigm, we train a lightweight orchestrator, ParaManager, which decouples planning decisions from subtask solving, enabling state-aware parallel subtask decomposition, delegation, and asynchronous execution. For training, we adopt a two-stage ParaManager training pipeline. It improves robustness by incorporating supervised fine-tuning (SFT) trajectories equipped with recovery mechanisms, and further applies reinforcement learning (RL) to achieve an optimal balance among task success, protocol compliance, diversity, and reasoning efficiency. Experiments show that ParaManager achieves strong performance across multiple benchmarks and exhibits robust generalization under unseen model pools.

Problem

Research questions and friction points this paper is trying to address.

multi-agent systems

agent-tool orchestration

heterogeneous interfaces

system extensibility

parallel subtask decomposition

Innovation

Methods, ideas, or system contributions that make the work stand out.

parallel orchestration

agent-tool unification

state-aware decomposition