π€ AI Summary
Existing parallel or sequential multi-agent collaboration paradigms struggle to simultaneously balance communication overhead, latency, and response accuracy. This work proposes Nexa, a novel framework that introduces the first response-driven adaptive hybrid execution mechanism. Nexa employs a lightweight Transformer-based policy network to dynamically generate a sparse directed acyclic communication graph after an initial parallel execution phase, determining whether to trigger a single round of sequential message propagation. The approach requires no external critic models or handcrafted topologies; instead, it leverages policy gradient optimization to embed multi-agent responses into a shared semantic space for predicting communication structures. This design enables strong generalization across tasks, agent counts, and base models. Experiments demonstrate that Nexa significantly reduces communication costs and latency while improving accuracy, and its learned communication policies exhibit effective transferability and reusability.
π Abstract
Multi-agent systems can solve complex tasks through collaboration between multiple Large Language Model agents. Existing collaboration frameworks typically operate in either a parallel or a sequential mode. In the parallel mode, agents respond independently to queries followed by aggregation of responses. In contrast, sequential systems allow agents to communicate via a directed topology and refine one another step by step. However, both modes are inadequate for achieving the desired objectives of minimizing communication and latency while simultaneously maximizing the accuracy of the final response. In this work, we introduce a hybrid paradigm called Nexa, a trainable response-conditioned policy that bridges the gap between the two modes. Nexa begins with a parallel execution stage, embeds the resulting responses into a shared semantic space, and then predicts a sparse directed acyclic communication graph. If the graph is empty, the system remains purely parallel; if it is non-empty, the system performs one sequential message propagation. The policy is a lightweight transformer model, and the method avoids the need for external LLM judges or reward models, as well as hand-crafted test-time topology search. We formalize this hybrid execution problem, show that the resulting graph is acyclic by construction, and that the framework strictly subsumes pure parallel execution, and present a training procedure based on policy-gradient optimization. Results demonstrate that the response-conditioned policy learned by Nexa under one setting can be reused when the number of agents, the task, or the underlying agent changes, thus emphasizing the generalizability of the learned communication policy.