Response-Conditioned Parallel-to-Sequential Orchestration for Multi-Agent Systems

πŸ“… 2026-05-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

215K/year
πŸ€– AI Summary
Existing parallel or sequential multi-agent collaboration paradigms struggle to simultaneously balance communication overhead, latency, and response accuracy. This work proposes Nexa, a novel framework that introduces the first response-driven adaptive hybrid execution mechanism. Nexa employs a lightweight Transformer-based policy network to dynamically generate a sparse directed acyclic communication graph after an initial parallel execution phase, determining whether to trigger a single round of sequential message propagation. The approach requires no external critic models or handcrafted topologies; instead, it leverages policy gradient optimization to embed multi-agent responses into a shared semantic space for predicting communication structures. This design enables strong generalization across tasks, agent counts, and base models. Experiments demonstrate that Nexa significantly reduces communication costs and latency while improving accuracy, and its learned communication policies exhibit effective transferability and reusability.
πŸ“ Abstract
Multi-agent systems can solve complex tasks through collaboration between multiple Large Language Model agents. Existing collaboration frameworks typically operate in either a parallel or a sequential mode. In the parallel mode, agents respond independently to queries followed by aggregation of responses. In contrast, sequential systems allow agents to communicate via a directed topology and refine one another step by step. However, both modes are inadequate for achieving the desired objectives of minimizing communication and latency while simultaneously maximizing the accuracy of the final response. In this work, we introduce a hybrid paradigm called Nexa, a trainable response-conditioned policy that bridges the gap between the two modes. Nexa begins with a parallel execution stage, embeds the resulting responses into a shared semantic space, and then predicts a sparse directed acyclic communication graph. If the graph is empty, the system remains purely parallel; if it is non-empty, the system performs one sequential message propagation. The policy is a lightweight transformer model, and the method avoids the need for external LLM judges or reward models, as well as hand-crafted test-time topology search. We formalize this hybrid execution problem, show that the resulting graph is acyclic by construction, and that the framework strictly subsumes pure parallel execution, and present a training procedure based on policy-gradient optimization. Results demonstrate that the response-conditioned policy learned by Nexa under one setting can be reused when the number of agents, the task, or the underlying agent changes, thus emphasizing the generalizability of the learned communication policy.
Problem

Research questions and friction points this paper is trying to address.

multi-agent systems
parallel execution
sequential communication
response accuracy
communication efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

response-conditioned policy
hybrid multi-agent orchestration
sparse communication graph
trainable coordination
generalizable communication policy
πŸ”Ž Similar Papers
No similar papers found.