Reducing Variance Caused by Communication in Decentralized Multi-agent Deep Reinforcement Learning

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

260K/year

🤖 AI Summary

In decentralized multi-agent deep reinforcement learning (MADRL), communication facilitates coordination but introduces significant uncertainty, substantially increasing policy gradient variance and undermining training stability. This work presents the first theoretical modeling and quantitative analysis of communication-induced policy gradient variance. We propose a modular variance suppression framework that employs control variates to design a communication-aware gradient correction module—lightweight, plug-and-play, and compatible with mainstream algorithms including MAPPO and QMix. Evaluated on StarCraft II and Traffic Junction benchmarks, our approach consistently reduces policy gradient variance, improves convergence stability, and enhances final task performance. Empirical results validate both the effectiveness and generalizability of our variance modeling and suppression methodology across diverse cooperative multi-agent settings.

Technology Category

Application Category

📝 Abstract

In decentralized multi-agent deep reinforcement learning (MADRL), communication can help agents to gain a better understanding of the environment to better coordinate their behaviors. Nevertheless, communication may involve uncertainty, which potentially introduces variance to the learning of decentralized agents. In this paper, we focus on a specific decentralized MADRL setting with communication and conduct a theoretical analysis to study the variance that is caused by communication in policy gradients. We propose modular techniques to reduce the variance in policy gradients during training. We adopt our modular techniques into two existing algorithms for decentralized MADRL with communication and evaluate them on multiple tasks in the StarCraft Multi-Agent Challenge and Traffic Junction domains. The results show that decentralized MADRL communication methods extended with our proposed techniques not only achieve high-performing agents but also reduce variance in policy gradients during training.

Problem

Research questions and friction points this paper is trying to address.

Reduces variance in decentralized MADRL communication.

Addresses uncertainty in multi-agent policy gradients.

Enhances agent coordination and learning stability.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular techniques reduce variance

Enhance decentralized MADRL communication

Policy gradient variance minimized

🔎 Similar Papers

Robust Coordination under Misaligned Communication via Power Regularization