MORPH: Multi-Environment Orchestrated Reinforcement Learning for PRB Handling in O-RAN

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

This work addresses the instability of reinforcement learning (RL) policies in O-RAN caused by distorted throughput signals by proposing the MORPH framework, which uniquely integrates three sources of throughput information: application-layer measurements (iPerf), empirical distributions from link adaptation, and PHY-level 3GPP-compliant OFDM simulations. This multi-source approach enables a collaborative RL training pipeline across heterogeneous environments. Implemented on the OpenAirInterface 5G-NR platform, MORPH facilitates slice-aware, PRB-level dynamic spectrum allocation and isolation within a single gNB. Experimental results demonstrate that, compared to RL trained on a single data source, MORPH significantly enhances policy robustness, slice-level performance stability, and service-level agreement (SLA) compliance across diverse network slices.

📝 Abstract

Reinforcement-learning (RL) solutions for dynamic spectrum access and radio resource management in Open Radio Access Networks (O-RAN) depend critically on the fidelity of the throughput signal used for training. Analytical or physical-layer (PHY)-only simulators scale well but often miss protocol-stack effects such as signaling overhead and retransmissions, whereas exhaustive throughput profiling on a standards-compliant 5G stack is slow and can be unstable under software execution constraints. This paper presents MORPH, a measurement-grounded multi-environment RL pipeline {for slice-aware PRB-level spectrum allocation (spectrum sharing and slice isolation within a single gNB)} built on OpenAirInterface (OAI) 5G-NR RF-simulator mode. MORPH leverages three complementary throughput sources: (i) application-layer throughput measured via \texttt{iPerf} on the OAI stack under controlled AWGN pathloss settings, (ii) empirical MCS-selection distributions conditioned on path loss, enabling a distribution-aware theoretical throughput estimator that reflects standards-compliant link adaptation, and (iii) scalable throughput estimates from a 3GPP-parameterized PHY-fidelity OFDM simulator. Using these components, we train and compare agents that differ only in the origin of their throughput feedback: an OAI-grounded practical agent, a simulator-driven agent, and MORPH, which fuses real and synthetic throughput signals for policy optimization. Evaluation on the OAI execution harness across heterogeneous slicing scenarios shows that MORPH yields more robust slice-wise performance and improved SLA compliance than single-source training, providing a practical foundation for PRB-level spectrum sharing and slice isolation within a single-cell stack and a stepping stone toward multi-cell spectrum coordination and interference management.

Problem

Research questions and friction points this paper is trying to address.

O-RAN

Reinforcement Learning

PRB Allocation

Spectrum Sharing

Throughput Estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-environment Reinforcement Learning

PRB-level Spectrum Allocation

O-RAN