🤖 AI Summary
This work addresses the challenging joint optimization of power and channel allocation in downlink non-orthogonal multiple access (NOMA) systems by proposing a novel deep reinforcement learning framework that integrates experience replay with an on-policy strategy. The approach explicitly models channel assignment while enabling efficient power allocation through a generalizable learning mechanism, thereby enhancing the system’s adaptability to dynamic environments. Comprehensive simulations systematically evaluate the impact of key hyperparameters—including learning rate, batch size, network architecture, and state feature dimensionality—on algorithmic performance. The results demonstrate that the proposed method significantly improves both resource utilization efficiency and overall system performance compared to existing approaches.
📝 Abstract
In recent years, Non-Orthogonal Multiple Access (NOMA) system has emerged as a promising candidate for multiple access frameworks due to the evolution of deep machine learning, trying to incorporate deep machine learning into the NOMA system. The main motivation for such active studies is the growing need to optimize the utilization of network resources as the expansion of the internet of things (IoT) caused a scarcity of network resources. The NOMA addresses this need by power multiplexing, allowing multiple users to access the network simultaneously. Nevertheless, the NOMA system has few limitations. Several works have proposed to mitigate this, including the optimization of power allocation known as joint resource allocation(JRA) method, and integration of the JRA method and deep reinforcement learning (JRA-DRL). Despite this, the channel assignment problem remains unclear and requires further investigation. In this paper, we propose a deep reinforcement learning framework incorporating replay memory with an on-policy algorithm, allocating network resources in a NOMA system to generalize the learning. Also, we provide extensive simulations to evaluate the effects of varying the learning rate, batch size, type of model, and the number of features in the state.