🤖 AI Summary
Deep neural network-based actor-critic methods in decentralized multi-agent reinforcement learning (MARL) lack theoretical convergence guarantees, with existing analyses restricted to linear function approximation and local stationary solutions.
Method: This paper proposes the first fully nonlinear deep neural network actor-critic framework for decentralized MARL, integrating decentralized consensus updates, stochastic policy gradient estimation, and nonconvex optimization analysis.
Contribution/Results: We establish rigorous global optimality convergence under mild assumptions and derive an $O(1/T)$ finite-time convergence rate. Experiments across diverse cooperative tasks demonstrate significant improvements over state-of-the-art baselines in both convergence speed and policy performance, thereby bridging the critical gap between empirical success and theoretical foundations in deep decentralized MARL.
📝 Abstract
Actor-critic methods for decentralized multi-agent reinforcement learning (MARL) facilitate collaborative optimal decision making without centralized coordination, thus enabling a wide range of applications in practice. To date, however, most theoretical convergence studies for existing actor-critic decentralized MARL methods are limited to the guarantee of a stationary solution under the linear function approximation. This leaves a significant gap between the highly successful use of deep neural actor-critic for decentralized MARL in practice and the current theoretical understanding. To bridge this gap, in this paper, we make the first attempt to develop a deep neural actor-critic method for decentralized MARL, where both the actor and critic components are inherently non-linear. We show that our proposed method enjoys a global optimality guarantee with a finite-time convergence rate of O(1/T), where T is the total iteration times. This marks the first global convergence result for deep neural actor-critic methods in the MARL literature. We also conduct extensive numerical experiments, which verify our theoretical results.