π€ AI Summary
To address the challenge of heterogeneous agent coordination in distributed channel access for WLANs, this paper proposes QPMIX: the first provably convergent heterogeneous multi-agent reinforcement learning (MARL) framework enabling joint optimization of value-based (e.g., Q-learning) and policy-based (e.g., PPO) agents under the centralized training with decentralized execution (CTDE) paradigm. QPMIX integrates linear value function approximation with rigorous theoretical convergence analysis, jointly optimizing throughput maximization and station fairness. Experimental results demonstrate that, under saturated traffic conditions, QPMIX significantly outperforms CSMA/CAβachieving higher throughput, lower average delay and jitter, and reduced packet collisions. Moreover, it maintains robustness and cooperative stability in non-saturated and latency-sensitive scenarios. These results empirically validate the effectiveness of heterogeneous agent coordination in dynamic wireless environments.
π Abstract
This paper investigates the use of multi-agent reinforcement learning (MARL) to address distributed channel access in wireless local area networks. In particular, we consider the challenging yet more practical case where the agents heterogeneously adopt value-based or policy-based reinforcement learning algorithms to train the model. We propose a heterogeneous MARL training framework, named QPMIX, which adopts a centralized training with distributed execution paradigm to enable heterogeneous agents to collaborate. Moreover, we theoretically prove the convergence of the proposed heterogeneous MARL method when using the linear value function approximation. Our method maximizes the network throughput and ensures fairness among stations, therefore, enhancing the overall network performance. Simulation results demonstrate that the proposed QPMIX algorithm improves throughput, mean delay, delay jitter, and collision rates compared with conventional carrier-sense multiple access with collision avoidance in the saturated traffic scenario. Furthermore, the QPMIX is shown to be robust in unsaturated and delay-sensitive traffic scenarios, and promotes cooperation among heterogeneous agents.