🤖 AI Summary
This work addresses the challenges of unstable communication protocols, misaligned message semantics, and interference between communication learning and policy optimization in multi-agent reinforcement learning (MARL). To tackle these issues, the authors propose a self-supervised decoupling framework that leverages contrastively aligned shared latent embeddings to learn low-dimensional, compact, and task-relevant communication representations. This approach ensures semantic consistency both across agents and over time. By constructing a dedicated communication space, the method effectively decouples communication learning from policy optimization, thereby substantially enhancing coordination efficiency. Experimental results demonstrate that the proposed approach consistently outperforms existing methods across standard MARL benchmarks and real-world warehouse scheduling tasks, achieving superior performance in terms of task success, sample efficiency, training stability, representation quality, and system throughput.
📝 Abstract
Emergent communication enables partially observant Autonomous Mobile Robots (AMRs) to coordinate effectively in decentralized multi-agent reinforcement learning (MARL) settings. However, existing approaches often struggle with unstable communication protocols, ungrounded message semantics, and interference between communication learning and policy optimization, leading to degraded coordination over time. We propose SCALE-COMM (Shared, Contrastively-Aligned Latent Embeddings for COMMunication), a self-supervised framework for learning compact, stable, and policy-relevant communication representations. SCALE-COMM decouples communication learning from policy optimization by training low-dimensional latent messages that capture task-relevant planning and traffic information, while enforcing consistency across agents and time. Across standard MARL benchmarks and a realistic warehouse coordination task, SCALE-COMM consistently outperforms existing communication frameworks in both representation quality and task performance. The learned communication space yields improved stability, sample efficiency, and throughput under policy fine-tuning, demonstrating the effectiveness of representation-driven communication for scalable multi-agent coordination.