🤖 AI Summary
Existing multi-agent collaborative systems are hindered by static workflows, sequential scheduling, and heterogeneous interfaces, leading to high complexity and poor scalability. This work proposes Agent-as-Tool, a unified paradigm that abstracts both agents and tools into a standardized, learnable action space, and introduces ParaManager—a lightweight coordinator enabling state-aware parallel subtask decomposition, delegation, and asynchronous execution. By unifying communication protocols and incorporating explicit state feedback, the framework facilitates efficient multi-agent collaboration. A two-stage training strategy—combining supervised fine-tuning with a recovery mechanism and reinforcement learning—optimizes task success rate, protocol compliance, response diversity, and reasoning efficiency. Experiments demonstrate that ParaManager achieves strong performance across multiple benchmarks and exhibits robust generalization to unseen agent pools.
📝 Abstract
Multi-agent systems (MAS) demonstrate clear advantages in tackling complex problems by coordinating diverse agents and external tools. However, most existing orchestration methods rely on static workflows or serial agent scheduling, and are further constrained by heterogeneous interface protocols between tools and agents. This leads to high system complexity and poor extensibility. To mitigate these issues, we propose Agent-as-Tool, a unified parallel orchestration paradigm that abstracts both agents and tools into a standardized, learnable action space with protocol normalization and explicit state feedback. Building on this paradigm, we train a lightweight orchestrator, ParaManager, which decouples planning decisions from subtask solving, enabling state-aware parallel subtask decomposition, delegation, and asynchronous execution. For training, we adopt a two-stage ParaManager training pipeline. It improves robustness by incorporating supervised fine-tuning (SFT) trajectories equipped with recovery mechanisms, and further applies reinforcement learning (RL) to achieve an optimal balance among task success, protocol compliance, diversity, and reasoning efficiency. Experiments show that ParaManager achieves strong performance across multiple benchmarks and exhibits robust generalization under unseen model pools.