🤖 AI Summary
This paper addresses efficient multi-agent coordination in reinforcement learning under communication constraints. We propose a decentralized actor-critic framework that integrates local policy updates, multi-step local training, and sparse inter-agent information exchange, while employing multi-layer neural networks to approximate value functions—thereby reducing communication dependency. To our knowledge, this is the first work to establish finite-time convergence analysis under Markovian sampling, explicitly quantifying how neural network approximation error affects convergence accuracy. Theoretically, we prove that the algorithm achieves an ε-accurate stationary point with sample complexity O(ε⁻³) and communication complexity O(ε⁻¹τ⁻¹), where τ characterizes the mixing time of the underlying Markov process. Extensive experiments on cooperative control tasks validate the method’s superior empirical performance and strong alignment with the derived theoretical bounds.
📝 Abstract
In this paper, we study the problem of reinforcement learning in multi-agent systems where communication among agents is limited. We develop a decentralized actor-critic learning framework in which each agent performs several local updates of its policy and value function, where the latter is approximated by a multi-layer neural network, before exchanging information with its neighbors. This local training strategy substantially reduces the communication burden while maintaining coordination across the network. We establish finite-time convergence analysis for the algorithm under Markov-sampling. Specifically, to attain the $varepsilon$-accurate stationary point, the sample complexity is of order $mathcal{O}(varepsilon^{-3})$ and the communication complexity is of order $mathcal{O}(varepsilon^{-1}τ^{-1})$, where tau denotes the number of local training steps. We also show how the final error bound depends on the neural network's approximation quality. Numerical experiments in a cooperative control setting illustrate and validate the theoretical findings.