🤖 AI Summary
This work addresses the intelligent mapping of O-RAN functional units—specifically the O-CU and O-DU—onto O-Cloud resources in the context of 6G open radio access networks. The problem is formulated as a sequential decision-making optimization task, and this study pioneers the application of reinforcement learning to cloud resource mapping for O-RAN network slicing. To this end, the authors propose SliceMapper, an rApp architecture supporting multiple Q-learning variants that integrate both on-policy and off-policy algorithms, implemented respectively with tabular representations and function approximation techniques. Simulation results demonstrate that the on-policy approach with function approximation achieves superior stability, whereas the tabular method yields a higher average reward (5.42 versus 5.12), thereby validating the effectiveness and practicality of the proposed solution.
📝 Abstract
In this paper, we propose an rApp, named SliceMapper, to optimize the mapping process of the open centralized unit (O-CU) and open distributed unit (O-DU) of an open radio access network (O-RAN) slice subnet onto the underlying open cloud (O-Cloud) sites in sixth-generation (6G) O-RAN. To accomplish this, we first design a system model for SliceMapper and introduce its mathematical framework. Next, we formulate the mapping process addressed by SliceMapper as a sequential decision-making optimization problem. To solve this problem, we implement both on-policy and off-policy variants of the Q-learning algorithm, employing tabular representation as well as function approximation methods for each variant. To evaluate the effectiveness of these approaches, we conduct a series of simulations under various scenarios. We proceed further by performing a comparative analysis of all four variants. The results demonstrate that the on-policy function approximation method outperforms the alternative approaches in terms of stability and lower standard deviation across all random seeds. However, the on-policy and off-policy tabular representation methods achieve higher average rewards, with values of 5.42 and 5.12, respectively. Finally, we conclude the paper and introduce several directions for future research.