When Less Latent Leads to Better Relay: Information-Preserving Compression for Latent Multi-Agent LLM Collaboration

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work addresses the high communication overhead and substantial memory consumption associated with transmitting full KV cache in multi-agent large language model collaboration, which hinders efficient context sharing. To tackle this challenge, the authors propose an Orthogonal Backfilling (OBF) mechanism that builds upon eviction-based KV compression by injecting low-rank orthogonal residuals of discarded KV states into the retained KV cache. This approach effectively balances information preservation with reduced communication costs. Experimental results demonstrate that OBF reduces communication volume by 79.8%–89.4% while matching or surpassing the performance of full KV relaying across nine standard benchmarks, achieving state-of-the-art results on seven tasks. These findings substantiate the principle that “less but more precise” information transfer can outperform “more but exhaustive” transmission.

Technology Category

Application Category

📝 Abstract

Communication in Large Language Model (LLM)-based multi-agent systems is moving beyond discrete tokens to preserve richer context. Recent work such as LatentMAS enables agents to exchange latent messages through full key-value (KV) caches. However, full KV relay incurs high memory and communication cost. We adapt eviction-style KV compression to this setting and introduce Orthogonal Backfill (OBF) to mitigate information loss from hard eviction. OBF injects a low-rank orthogonal residual from discarded KV states into the retained KV states. We evaluate proposed method against full KV relay on nine standard benchmarks spanning mathematical reasoning, coding, and knowledge-intensive QA. It achieves performance comparable to full KV relay while reducing communication cost by 79.8%--89.4%. OBF further improves the performance and achieves the best results on 7 of the 9 benchmarks. This suggests that more information does not necessarily lead to better communication; preserving the most useful information matters more. Our codebase is publicly available on https://github.com/markli404/When-Less-Latent-Leads-to-Better-Relay.

Problem

Research questions and friction points this paper is trying to address.

Latent Communication

Multi-Agent LLM

KV Compression

Information Preservation

Communication Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Multi-Agent Collaboration

KV Cache Compression

Orthogonal Backfill