Communicating Activations Between Language Model Agents

📅 2025-01-23

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address the substantial decoding overhead, severe information loss, and low computational efficiency inherent in natural language–based communication among large language model (LLM) agents, this paper introduces, for the first time, a zero-parameter cross-model collaborative inference paradigm that leverages intermediate-layer neural activations as the communication medium. Our approach employs a pause-fuse-resume mechanism to directly exchange and fuse hidden states across different LLMs at intermediate Transformer layers, supporting plug-and-play, training-free fusion operations—including concatenation, weighted summation, and gating—without additional parameters, data, or fine-tuning. Evaluated on multi-agent coordination and complex reasoning benchmarks, our method achieves up to a 27.0% absolute accuracy gain over natural language communication, reduces computational overhead to less than one-quarter, and significantly improves generalization and inference stability.

Technology Category

Application Category

📝 Abstract

Communication between multiple language model (LM) agents has been shown to scale up the reasoning ability of LMs. While natural language has been the dominant medium for inter-LM communication, it is not obvious this should be the standard: not only does natural language communication incur high inference costs that scale quickly with the number of both agents and messages, but also the decoding process abstracts away too much rich information that could be otherwise accessed from the internal activations. In this work, we propose a simple technique whereby LMs communicate via activations; concretely, we pause an LM $ extit{B}$'s computation at an intermediate layer, combine its current activation with another LM $ extit{A}$'s intermediate activation via some function $ extit{f}$, then pass $ extit{f}$'s output into the next layer of $ extit{B}$ and continue the forward pass till decoding is complete. This approach scales up LMs on new tasks with zero additional parameters and data, and saves a substantial amount of compute over natural language communication. We test our method with various functional forms $ extit{f}$ on two experimental setups--multi-player coordination games and reasoning benchmarks--and find that it achieves up to $27.0%$ improvement over natural language communication across datasets with $<$$1/4$ the compute, illustrating the superiority and robustness of activations as an alternative"language"for communication between LMs.

Problem

Research questions and friction points this paper is trying to address.

Language Models

Internal Activations

Communication Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Intermediate Activations

Resource Efficiency

Performance Enhancement

🔎 Similar Papers

Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study