A Multi-Agent Multi-Environment Mixed Q-Learning for Partially Decentralized Wireless Network Optimization

📅 2024-09-24
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In partially decentralized wireless networks, joint optimization among multiple mobile transmitters and base stations faces challenges including absence of global state awareness, high communication overhead, and decision errors. Method: This paper proposes a Multi-Agent Multi-Environment Mixed Q-learning (MEMQ) framework featuring a novel coordinated/uncordinated dual-mode mechanism, integrating Bayesian joint state estimation with limited information sharing. Communication overhead scales linearly with the number of nodes and remains independent of the joint state space. Contribution/Results: Theoretical analysis guarantees convergence. Experiments show that, compared to centralized MEMQ, our method achieves 50% faster inference with only a 20% increase in average performance error (APE); relative to state-of-the-art decentralized Q-learning approaches, it accelerates training by 25% and reduces APE by 40%.

Technology Category

Application Category

📝 Abstract
Q-learning is a powerful tool for network control and policy optimization in wireless networks, but it struggles with large state spaces. Recent advancements, like multi-environment mixed Q-learning (MEMQ), improves performance and reduces complexity by integrating multiple Q-learning algorithms across multiple related environments so-called digital cousins. However, MEMQ is designed for centralized single-agent networks and is not suitable for decentralized or multi-agent networks. To address this challenge, we propose a novel multi-agent MEMQ algorithm for partially decentralized wireless networks with multiple mobile transmitters (TXs) and base stations (BSs), where TXs do not have access to each other's states and actions. In uncoordinated states, TXs act independently to minimize their individual costs. In coordinated states, TXs use a Bayesian approach to estimate the joint state based on local observations and share limited information with leader TX to minimize joint cost. The cost of information sharing scales linearly with the number of TXs and is independent of the joint state-action space size. The proposed scheme is 50% faster than centralized MEMQ with only a 20% increase in average policy error (APE) and is 25% faster than several advanced decentralized Q-learning algorithms with 40% less APE. The convergence of the algorithm is also demonstrated.
Problem

Research questions and friction points this paper is trying to address.

Decentralized Control
Wireless Networks
Multi-Agent Coordination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent MEMQ
Decentralized Cooperation
Efficiency Improvement
🔎 Similar Papers
No similar papers found.