KVComm: Enabling Efficient LLM Communication through Selective KV Sharing

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) face significant challenges in multi-agent systems, including high communication overhead, substantial information loss and bias, and low computational efficiency. To address these issues, we propose KVComm—a novel, efficient communication protocol based on selective key-value (KV) sharing. Our method introduces a layer-wise KV selection strategy that uniquely integrates attention importance scores with a Gaussian prior, enabling transmission of only ~30% of Transformer-layer KVs while approaching the performance upper bound of full-input merging. Furthermore, KVComm leverages the inherent KV caching mechanism to precisely identify and propagate critical hidden states across agents. Extensive experiments across diverse multi-task and multi-model configurations demonstrate that KVComm matches the performance of input-merging baselines while reducing communication overhead by 70%, thereby substantially enhancing the coordination efficiency of LLM-based multi-agent systems.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) are increasingly deployed in multi-agent systems, where effective inter-model communication is crucial. Existing communication protocols either rely on natural language, incurring high inference costs and information loss, or on hidden states, which suffer from information concentration bias and inefficiency. To address these limitations, we propose KVComm, a novel communication framework that enables efficient communication between LLMs through selective sharing of KV pairs. KVComm leverages the rich information encoded in the KV pairs while avoiding the pitfalls of hidden states. We introduce a KV layer-wise selection strategy based on attention importance scores with a Gaussian prior to identify the most informative KV pairs for communication. Extensive experiments across diverse tasks and model pairs demonstrate that KVComm achieves comparable performance to the upper-bound method, which directly merges inputs to one model without any communication, while transmitting as few as 30% of layers' KV pairs. Our study highlights the potential of KV pairs as an effective medium for inter-LLM communication, paving the way for scalable and efficient multi-agent systems.
Problem

Research questions and friction points this paper is trying to address.

Enabling efficient communication between large language models
Selectively sharing KV pairs to reduce transmission costs
Addressing limitations of natural language and hidden state protocols
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective KV sharing for efficient LLM communication
Layer-wise selection using attention importance scores
Transmits only 30% of KV pairs while maintaining performance
🔎 Similar Papers
No similar papers found.