🤖 AI Summary
This work addresses the inefficiency, poor adaptability, and weak fault tolerance in decentralized multi-agent systems stemming from static role assignment and centralized control. To overcome these limitations, the authors propose a self-organizing coordination mechanism that eliminates the need for predefined roles. Agent selection is formulated as an online multi-armed bandit problem, and a two-stage dynamic beacon protocol is introduced: first, a lightweight candidate filtering step narrows down potential agents, followed by adaptive LinUCB-based subtask routing that leverages contextual features of both tasks and agent states. The policy is continuously refined using delayed end-to-end feedback. Experiments demonstrate that the approach significantly improves routing efficiency on both simulated environments and real-world large language model benchmarks, while exhibiting strong robustness, self-healing capabilities under distribution shifts and agent failures, and favorable scalability.
📝 Abstract
Multi-agent large language model systems can tackle complex multi-step tasks by decomposing work and coordinating specialized behaviors. However, current coordination mechanisms typically rely on statically assigned roles and centralized controllers. As agent pools and task distributions evolve, these design choices lead to inefficient routing, poor adaptability, and fragile fault recovery capabilities. We introduce Symphony-Coord, a decentralized multi-agent framework that transforms agent selection into an online multi-armed bandit problem, enabling roles to emerge organically through interaction. The framework employs a two-stage dynamic beacon protocol: (i) a lightweight candidate screening mechanism to limit communication and computational overhead; (ii) an adaptive LinUCB selector that routes subtasks based on context features derived from task requirements and agent states, continuously optimized through delayed end-to-end feedback. Under standard linear realizability assumptions, we provide sublinear regret bounds, indicating the system converges toward near-optimal allocation schemes. Validation through simulation experiments and real-world large language model benchmarks demonstrates that Symphony-Coord not only enhances task routing efficiency but also exhibits robust self-healing capabilities in scenarios involving distribution shifts and agent failures, achieving a scalable coordination mechanism without predefined roles.