🤖 AI Summary
Existing communication protocols in multi-agent reinforcement learning (MARL) suffer from high complexity, non-differentiability, and poor scalability. To address these issues, this paper proposes a lightweight, fully differentiable self-attention communication module. The module generates agent-specific messages in a reward-driven manner, enabling efficient, end-to-end trainable inter-agent information exchange. Its fixed-parameter architecture ensures that computational and communication overhead remain constant regardless of the number of agents, significantly enhancing scalability. Moreover, the module is natively compatible with mainstream value-decomposition methods (e.g., QMIX) and functions as a plug-and-play enhancement. Evaluated on multiple heterogeneous maps in the SMAC benchmark, our approach achieves state-of-the-art performance, demonstrating superior effectiveness, robustness, and generalization capability.
📝 Abstract
Communication is essential for the collective execution of complex tasks by human agents, motivating interest in communication mechanisms for multi-agent reinforcement learning (MARL). However, existing communication protocols in MARL are often complex and non-differentiable. In this work, we introduce a self-attention-based communication module that exchanges information between the agents in MARL. Our proposed approach is fully differentiable, allowing agents to learn to generate messages in a reward-driven manner. The module can be seamlessly integrated with any action-value function decomposition method and can be viewed as an extension of such decompositions. Notably, it includes a fixed number of trainable parameters, independent of the number of agents. Experimental results on the SMAC benchmark demonstrate the effectiveness of our approach, which achieves state-of-the-art performance on several maps.