Decoupling Communication from Policy: Robust MARL under Bandwidth Constraints

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

233K/year
🤖 AI Summary
This work addresses the challenge of bandwidth-constrained multi-agent reinforcement learning (MARL), where conventional approaches suffer performance degradation due to the entanglement of communication and policy representations, causing compression to adversely affect policy efficacy. To overcome this limitation, the authors propose a decoupled architecture that separates communication from policy learning via dedicated communication channels and introduces a normalized bandwidth budget β, enabling, for the first time, an isolated analysis of communication overhead and policy capacity. The method employs a lightweight SLIM design, end-to-end training, and explicit modeling of partially observable environments. Evaluated across multiple MARL benchmarks, it achieves state-of-the-art performance while maintaining strong robustness and scalability even under severe bandwidth compression.
📝 Abstract
Communication enables coordination in multi-agent reinforcement learning (MARL), but many real-world applications, e.g., search-and-rescue with drone swarms, operate under severe bandwidth constraints. Many communication architectures still expose a coupled bottleneck in which a shared latent representation is used for both policy execution and inter-agent communication. Consequently, reducing message size directly limits the policy's latent space, often leading to significant performance degradation. We address this with two contributions. First, we introduce $β$, a normalised per-agent bandwidth budget that unifies sparsity, rounds, and message dimension into a single comparable constraint. Second, we provide SLIM, a minimal architecture that decouples the communication pathway from the policy's latent representation, allowing us to isolate the effect of bandwidth from the effect of policy capacity while benefiting from in-step communication. We evaluate our method on several partially-observable MARL benchmarks, where communication is essential. Our approach achieves state-of-the-art performance and exhibits scalability and robustness under limited communication, with only marginal degradation as bandwidth is reduced.
Problem

Research questions and friction points this paper is trying to address.

multi-agent reinforcement learning
bandwidth constraints
communication
policy decoupling
latent representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

decoupled communication
bandwidth-constrained MARL
SLIM architecture
normalized bandwidth budget
multi-agent reinforcement learning