An Offline Multi-Agent Reinforcement Learning Framework for Radio Resource Management

📅 2025-01-22

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This paper addresses the multi-access point (AP) coordinated resource scheduling problem in wireless networks. To jointly optimize weighted sum rate and tail rate for user equipment, we propose the first offline multi-agent reinforcement learning (MARL) framework specifically designed for wireless resource management. Methodologically, we develop an offline training mechanism compatible with the centralized training with decentralized execution (CTDE) paradigm, integrating behavior cloning and conservative Q-learning to ensure policy performance, decentralized execution capability, and computational efficiency. Simulation results demonstrate that our approach improves both sum rate and tail rate by over 15% compared to conventional online MARL and heuristic methods, while significantly reducing training overhead and maintaining practical deployability. The core contribution lies in the first systematic application of offline MARL to wireless resource management, establishing a lightweight, robust, and scalable CTDE-based offline optimization paradigm.

Technology Category

Application Category

📝 Abstract

Offline multi-agent reinforcement learning (MARL) addresses key limitations of online MARL, such as safety concerns, expensive data collection, extended training intervals, and high signaling overhead caused by online interactions with the environment. In this work, we propose an offline MARL algorithm for radio resource management (RRM), focusing on optimizing scheduling policies for multiple access points (APs) to jointly maximize the sum and tail rates of user equipment (UEs). We evaluate three training paradigms: centralized, independent, and centralized training with decentralized execution (CTDE). Our simulation results demonstrate that the proposed offline MARL framework outperforms conventional baseline approaches, achieving over a 15% improvement in a weighted combination of sum and tail rates. Additionally, the CTDE framework strikes an effective balance, reducing the computational complexity of centralized methods while addressing the inefficiencies of independent training. These results underscore the potential of offline MARL to deliver scalable, robust, and efficient solutions for resource management in dynamic wireless networks.

Problem

Research questions and friction points this paper is trying to address.

Multi-Agent Reinforcement Learning

Wireless Network Resource Management

Mobile User Speed Enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Offline Multi-Agent Reinforcement Learning

Wireless Resource Management

Centralized Training Decentralized Execution (CTDE)

🔎 Similar Papers

Offline and Distributional Reinforcement Learning for Radio Resource Management