🤖 AI Summary
This work addresses the limitations of existing large language model–based multi-agent systems, which are often task-specific, reliant on handcrafted roles and natural language communication, leading to architectural complexity, error propagation, and poor generalization. The authors propose “agent primitives” as universal building blocks that decompose collaboration into three reusable computational patterns: Review, Voting and Selection, and Planning and Execution. Efficient and robust communication is achieved through KV caching, while an Organizer agent automatically composes primitives based on historically successful configurations. This approach introduces neural modularization into multi-agent systems for the first time, significantly enhancing generality, stability, and efficiency: it improves accuracy by 12.0–16.5% over single-agent baselines, reduces token consumption and inference latency by 3–4× compared to text-based multi-agent systems, incurs only 1.3–1.6× the overhead of a single agent, and demonstrates consistently stronger performance across diverse model backbones.
📝 Abstract
While existing multi-agent systems (MAS) can handle complex problems by enabling collaboration among multiple agents, they are often highly task-specific, relying on manually crafted agent roles and interaction prompts, which leads to increased architectural complexity and limited reusability across tasks. Moreover, most MAS communicate primarily through natural language, making them vulnerable to error accumulation and instability in long-context, multi-stage interactions within internal agent histories. In this work, we propose \textbf{Agent Primitives}, a set of reusable latent building blocks for LLM-based MAS. Inspired by neural network design, where complex models are built from reusable components, we observe that many existing MAS architectures can be decomposed into a small number of recurring internal computation patterns. Based on this observation, we instantiate three primitives: Review, Voting and Selection, and Planning and Execution. All primitives communicate internally via key-value (KV) cache, which improves both robustness and efficiency by mitigating information degradation across multi-stage interactions. To enable automatic system construction, an Organizer agent selects and composes primitives for each query, guided by a lightweight knowledge pool of previously successful configurations, forming a primitive-based MAS. Experiments show that primitives-based MAS improve average accuracy by 12.0-16.5\% over single-agent baselines, reduce token usage and inference latency by approximately 3$\times$-4$\times$ compared to text-based MAS, while incurring only 1.3$\times$-1.6$\times$ overhead relative to single-agent inference and providing more stable performance across model backbones.