LCGuard: Latent Communication Guard for Safe KV Sharing in Multi-Agent Systems

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the risk of sensitive context or private information leakage in multi-agent systems when sharing Transformer key-value (KV) caches as an implicit communication mechanism. It formally defines, for the first time, a representation-level information leakage threat based on input reconstructability and introduces an adversarial training framework: a defender learns to apply secure representational transformations to the KV cache to prevent reconstruction of sensitive inputs, while an attacker attempts to recover the original inputs from the transformed cache. Experimental results across multiple model architectures and multi-agent benchmarks demonstrate that the proposed method substantially reduces both information leakage rates and attack success probabilities, while preserving task performance comparable to standard KV sharing.

📝 Abstract

Large language model (LLM)-based multi-agent systems increasingly rely on intermediate communication to coordinate complex tasks. While most existing systems communicate through natural language, recent work shows that latent communication, particularly through transformer key-value (KV) caches, can improve efficiency and preserve richer task-relevant information. However, KV caches also encode contextual inputs, intermediate reasoning states, and agent-specific information, creating an opaque channel through which sensitive content may propagate across agents without explicit textual disclosure. To address this, we introduce \textbf{LCGuard} (Latent Communication Guard), a framework for safe KV-based latent communication in multi-agent LLM systems. LCGuard treats shared KV caches as latent working memory and learns representation-level transformations before cache artifacts are transmitted across agents. We formalize representation-level sensitive information leakage operationally through reconstruction: a shared cache artifact is unsafe if an adversarial decoder can recover agent-specific sensitive inputs from it. This leads to an adversarial training formulation in which the adversary learns to reconstruct sensitive inputs, while LCGuard learns transformations that preserve task-relevant semantics and reduce reconstructable information. Empirical evaluations across multiple model families and multi-agent benchmarks show that LCGuard consistently reduces reconstruction-based leakage and attack success rates while maintaining competitive task performance compared to standard KV-sharing baselines.

Problem

Research questions and friction points this paper is trying to address.

latent communication

KV sharing

multi-agent systems

sensitive information leakage

privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Communication

KV Cache Sharing

Adversarial Training