An Offline Multi-Agent Reinforcement Learning Framework for Radio Resource Management

📅 2025-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the multi-access point (AP) coordinated resource scheduling problem in wireless networks. To jointly optimize weighted sum rate and tail rate for user equipment, we propose the first offline multi-agent reinforcement learning (MARL) framework specifically designed for wireless resource management. Methodologically, we develop an offline training mechanism compatible with the centralized training with decentralized execution (CTDE) paradigm, integrating behavior cloning and conservative Q-learning to ensure policy performance, decentralized execution capability, and computational efficiency. Simulation results demonstrate that our approach improves both sum rate and tail rate by over 15% compared to conventional online MARL and heuristic methods, while significantly reducing training overhead and maintaining practical deployability. The core contribution lies in the first systematic application of offline MARL to wireless resource management, establishing a lightweight, robust, and scalable CTDE-based offline optimization paradigm.

Technology Category

Application Category

📝 Abstract
Offline multi-agent reinforcement learning (MARL) addresses key limitations of online MARL, such as safety concerns, expensive data collection, extended training intervals, and high signaling overhead caused by online interactions with the environment. In this work, we propose an offline MARL algorithm for radio resource management (RRM), focusing on optimizing scheduling policies for multiple access points (APs) to jointly maximize the sum and tail rates of user equipment (UEs). We evaluate three training paradigms: centralized, independent, and centralized training with decentralized execution (CTDE). Our simulation results demonstrate that the proposed offline MARL framework outperforms conventional baseline approaches, achieving over a 15% improvement in a weighted combination of sum and tail rates. Additionally, the CTDE framework strikes an effective balance, reducing the computational complexity of centralized methods while addressing the inefficiencies of independent training. These results underscore the potential of offline MARL to deliver scalable, robust, and efficient solutions for resource management in dynamic wireless networks.
Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Reinforcement Learning
Wireless Network Resource Management
Mobile User Speed Enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline Multi-Agent Reinforcement Learning
Wireless Resource Management
Centralized Training Decentralized Execution (CTDE)
🔎 Similar Papers
No similar papers found.