Generative Multi-Agent Q-Learning for Policy Optimization: Decentralized Wireless Networks

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of decentralized wireless networks—where multiple transmitters and base stations lack global state information and thus struggle to coordinate optimization strategies—this paper proposes the generative multi-agent MEMQ framework. MEMQ introduces a novel dual-mode state mechanism (coordinated/non-coordinated), integrating Bayesian joint state estimation with a digital cousins environment family for digital twin modeling. This enables distributed training and execution (DTDE) that is low-overhead (communication complexity linear in the number of transmitters), highly generalizable, and provably convergent. Compared to state-of-the-art multi-agent RL methods, MEMQ reduces average policy error by 55%, accelerates convergence by 35%, and cuts runtime and sampling complexity by 50% and 45%, respectively. Its performance approaches that of centralized methods while incurring significantly lower computational and communication overhead.

Technology Category

Application Category

📝 Abstract
Q-learning is a widely used reinforcement learning (RL) algorithm for optimizing wireless networks, but faces challenges with large state-spaces. Recently proposed multi-environment mixed Q-learning (MEMQ) algorithm addresses these challenges by employing multiple Q-learning algorithms across multiple synthetically generated, distinct but structurally related environments, so-called digital cousins. In this paper, we propose a novel multi-agent MEMQ (M-MEMQ) for cooperative decentralized wireless networks with multiple networked transmitters (TXs) and base stations (BSs). TXs do not have access to global information (joint state and actions). The new concept of coordinated and uncoordinated states is introduced. In uncoordinated states, TXs act independently to minimize their individual costs and update local Q-functions. In coordinated states, TXs use a Bayesian approach to estimate the joint state and update the joint Q-functions. The cost of information-sharing scales linearly with the number of TXs and is independent of the joint state-action space size. Several theoretical guarantees, including deterministic and probabilistic convergence, bounds on estimation error variance, and the probability of misdetecting the joint states, are given. Numerical simulations show that M-MEMQ outperforms several decentralized and centralized training with decentralized execution (CTDE) multi-agent RL algorithms by achieving 55% lower average policy error (APE), 35% faster convergence, 50% reduced runtime complexity, and 45% less sample complexity. Furthermore, M-MEMQ achieves comparable APE with significantly lower complexity than centralized methods. Simulations validate the theoretical analyses.
Problem

Research questions and friction points this paper is trying to address.

Optimizes decentralized wireless networks using multi-agent Q-learning.
Introduces coordinated and uncoordinated states for efficient policy updates.
Reduces policy error and complexity compared to existing RL algorithms.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent MEMQ for decentralized wireless networks
Bayesian approach for joint state estimation
Linear scaling of information-sharing cost
🔎 Similar Papers
No similar papers found.