🤖 AI Summary
In partially decentralized wireless networks, joint optimization among multiple mobile transmitters and base stations faces challenges including absence of global state awareness, high communication overhead, and decision errors. Method: This paper proposes a Multi-Agent Multi-Environment Mixed Q-learning (MEMQ) framework featuring a novel coordinated/uncordinated dual-mode mechanism, integrating Bayesian joint state estimation with limited information sharing. Communication overhead scales linearly with the number of nodes and remains independent of the joint state space. Contribution/Results: Theoretical analysis guarantees convergence. Experiments show that, compared to centralized MEMQ, our method achieves 50% faster inference with only a 20% increase in average performance error (APE); relative to state-of-the-art decentralized Q-learning approaches, it accelerates training by 25% and reduces APE by 40%.
📝 Abstract
Q-learning is a powerful tool for network control and policy optimization in wireless networks, but it struggles with large state spaces. Recent advancements, like multi-environment mixed Q-learning (MEMQ), improves performance and reduces complexity by integrating multiple Q-learning algorithms across multiple related environments so-called digital cousins. However, MEMQ is designed for centralized single-agent networks and is not suitable for decentralized or multi-agent networks. To address this challenge, we propose a novel multi-agent MEMQ algorithm for partially decentralized wireless networks with multiple mobile transmitters (TXs) and base stations (BSs), where TXs do not have access to each other's states and actions. In uncoordinated states, TXs act independently to minimize their individual costs. In coordinated states, TXs use a Bayesian approach to estimate the joint state based on local observations and share limited information with leader TX to minimize joint cost. The cost of information sharing scales linearly with the number of TXs and is independent of the joint state-action space size. The proposed scheme is 50% faster than centralized MEMQ with only a 20% increase in average policy error (APE) and is 25% faster than several advanced decentralized Q-learning algorithms with 40% less APE. The convergence of the algorithm is also demonstrated.