🤖 AI Summary
To address poor communication scalability and low consensus efficiency in large-scale multi-agent reinforcement learning (MARL), this paper proposes ExpoComm—the first MARL communication protocol to incorporate exponential graph topologies. ExpoComm leverages the small diameter and sparsity of exponential graphs to enable efficient global information diffusion, eliminating reliance on pairwise link selection. It introduces a memory-augmented message encoder that implicitly embeds global state information into local messages, and integrates auxiliary prediction tasks with multi-task collaborative training to enhance representation learning. Evaluated on large-scale benchmarks—including MAgent and infrastructure management planning—ExpoComm significantly outperforms existing methods in both performance and sample efficiency. Moreover, it demonstrates strong zero-shot transfer capability across diverse environments. The implementation is publicly available.
📝 Abstract
In cooperative multi-agent reinforcement learning (MARL), well-designed communication protocols can effectively facilitate consensus among agents, thereby enhancing task performance. Moreover, in large-scale multi-agent systems commonly found in real-world applications, effective communication plays an even more critical role due to the escalated challenge of partial observability compared to smaller-scale setups. In this work, we endeavor to develop a scalable communication protocol for MARL. Unlike previous methods that focus on selecting optimal pairwise communication links-a task that becomes increasingly complex as the number of agents grows-we adopt a global perspective on communication topology design. Specifically, we propose utilizing the exponential topology to enable rapid information dissemination among agents by leveraging its small-diameter and small-size properties. This approach leads to a scalable communication protocol, named ExpoComm. To fully unlock the potential of exponential graphs as communication topologies, we employ memory-based message processors and auxiliary tasks to ground messages, ensuring that they reflect global information and benefit decision-making. Extensive experiments on large-scale cooperative benchmarks, including MAgent and Infrastructure Management Planning, demonstrate the superior performance and robust zero-shot transferability of ExpoComm compared to existing communication strategies. The code is publicly available at https://github.com/LXXXXR/ExpoComm.