Latent Collaboration in Multi-Agent Systems

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-mediated multi-agent systems (MAS) leveraging large language models (LLMs) suffer from low inference efficiency and severe information loss during reasoning and inter-agent communication. Method: We propose a training-free, end-to-end framework that enables implicit agent collaboration within a continuous latent space—using LLM final-layer hidden embeddings as initialization, then autoregressively generating “latent thoughts” to construct a shared latent working memory for lossless, high-bandwidth information exchange. Contribution/Results: We theoretically establish that this mechanism achieves superior expressivity, zero information loss, and reduced computational complexity—overcoming the fundamental bottleneck of text mediation. Empirically, on nine benchmark tasks, our approach improves accuracy by up to 14.6%, reduces output token count by 70.8%–83.7%, and accelerates end-to-end inference by 4.0×–4.3×.

Technology Category

Application Category

📝 Abstract
Multi-agent systems (MAS) extend large language models (LLMs) from independent single-model reasoning to coordinative system-level intelligence. While existing LLM agents depend on text-based mediation for reasoning and communication, we take a step forward by enabling models to collaborate directly within the continuous latent space. We introduce LatentMAS, an end-to-end training-free framework that enables pure latent collaboration among LLM agents. In LatentMAS, each agent first performs auto-regressive latent thoughts generation through last-layer hidden embeddings. A shared latent working memory then preserves and transfers each agent's internal representations, ensuring lossless information exchange. We provide theoretical analyses establishing that LatentMAS attains higher expressiveness and lossless information preservation with substantially lower complexity than vanilla text-based MAS. In addition, empirical evaluations across 9 comprehensive benchmarks spanning math and science reasoning, commonsense understanding, and code generation show that LatentMAS consistently outperforms strong single-model and text-based MAS baselines, achieving up to 14.6% higher accuracy, reducing output token usage by 70.8%-83.7%, and providing 4x-4.3x faster end-to-end inference. These results demonstrate that our new latent collaboration framework enhances system-level reasoning quality while offering substantial efficiency gains without any additional training. Code and data are fully open-sourced at https://github.com/Gen-Verse/LatentMAS.
Problem

Research questions and friction points this paper is trying to address.

Enabling direct collaboration between LLM agents in latent space
Reducing communication overhead in multi-agent systems
Improving reasoning efficiency without additional training requirements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enables direct collaboration in continuous latent space
Uses shared latent working memory for lossless exchange
Achieves higher accuracy with reduced token usage
🔎 Similar Papers
No similar papers found.