Decoupling Communication from Policy: Robust MARL under Bandwidth Constraints

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

255K/year

🤖 AI Summary

This work addresses the challenge of bandwidth-constrained multi-agent reinforcement learning (MARL), where conventional approaches suffer performance degradation due to the entanglement of communication and policy representations, causing compression to adversely affect policy efficacy. To overcome this limitation, the authors propose a decoupled architecture that separates communication from policy learning via dedicated communication channels and introduces a normalized bandwidth budget β, enabling, for the first time, an isolated analysis of communication overhead and policy capacity. The method employs a lightweight SLIM design, end-to-end training, and explicit modeling of partially observable environments. Evaluated across multiple MARL benchmarks, it achieves state-of-the-art performance while maintaining strong robustness and scalability even under severe bandwidth compression.

📝 Abstract

Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce $β$, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.

Problem

Research questions and friction points this paper is trying to address.

multi-agent reinforcement learning

bandwidth constraints

communication

policy decoupling

latent representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

decoupled communication

bandwidth-constrained MARL

SLIM architecture