Hierarchical Cooperative MARL for Joint Downlink PRB and Power Allocation in a 5G System

📅 2026-05-03

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

This work addresses the joint optimization of downlink physical resource block (PRB) allocation and transmit power in 5G OFDMA systems by proposing a hierarchical cooperative multi-agent reinforcement learning (MARL) framework. The approach decouples the problem into two stages: user-level PRB share learning and base station power budget allocation, followed by a deterministic channel-aware quota resolver that maps shares to concrete PRB assignments. Innovatively integrating a hierarchical MARL architecture, a three-phase curriculum training strategy, and cross-layer feedback mechanisms, the method achieves end-to-end joint optimization within the Sionna system-level simulator using ray-tracing-based channel models. Experimental results demonstrate that the proposed scheme significantly improves cell throughput compared to proportional fair scheduling, with only marginal fairness degradation, while its submodules also effectively enhance throughput distribution.

📝 Abstract

Efficient downlink radio resource management in 5G requires jointly optimizing user scheduling and transmit-power allocation under time-varying wireless conditions. This is challenging in OFDMA systems because PRB assignment is combinatorial, power allocation is continuous, and performance depends on channel evolution, link adaptation, and long-term fairness. We propose a hierarchical cooperative multi-agent reinforcement learning framework with staged curriculum training for joint downlink PRB and power allocation in a physically grounded 5G environment. System-level simulation is implemented in Sionna, while Sionna RT supports wireless scene construction and mobility-aware ray-traced channel generation. The control task is decomposed into two sequential stages: a PRB agent learns user-level resource shares, which are converted to exact PRB assignments by a deterministic channel-aware quota resolver, and a power agent distributes the base-station power budget across users and their assigned PRB-symbol resources. The framework operates in a cross-layer loop with adaptive modulation and coding, HARQ feedback, outer-loop link adaptation, and a fairness-aware reward based on smoothed throughput and Jain's fairness index. Training stability is improved through a three-phase curriculum for PRB allocation, power control, and joint fine-tuning. Under matched channel realizations, we compare against a PF scheduler with equal-power transmission and two ablations isolating the learned PRB and power-control components. Results show that both learned components improve throughput distribution relative to PF, while the full PRB and power controller achieves the largest cell-throughput gain with only a modest reduction in Jain's fairness index.

Problem

Research questions and friction points this paper is trying to address.

downlink resource allocation

PRB allocation

power allocation

fairness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical MARL

Joint PRB and Power Allocation

Curriculum Training

Cross-layer Optimization