🤖 AI Summary
This paper addresses the joint optimization of time allocation and power control in simultaneous wireless information and power transfer (SWIPT)-enabled cognitive Internet of Things (CIoT) networks. Under practical constraints—including small-scale fading, realistic energy harvesting dynamics, and stringent interference limits—the problem is formulated as a Markov decision process (MDP). To enable end-to-end autonomous resource coordination, we propose a novel Double Deep Q-Network with Upper Confidence Bound (DDQN-UCB) algorithm—the first to integrate UCB-based exploration into DDQN for CIoT resource management. The method jointly maximizes throughput and energy efficiency, thereby significantly extending network lifetime. Simulation results demonstrate that, compared to state-of-the-art deep reinforcement learning approaches, the proposed algorithm improves system throughput by 18.7% and increases average node lifetime by 23.4%, while satisfying strict real-time and energy-efficiency requirements.
📝 Abstract
This letter presents a novel deep reinforcement learning (DRL) approach for joint time allocation and power control in a cognitive Internet of Things (CIoT) system with simultaneous wireless information and power transfer (SWIPT). The CIoT transmitter autonomously manages energy harvesting (EH) and transmissions using a learnable time switching factor while optimizing power to enhance throughput and lifetime. The joint optimization is modeled as a Markov decision process under small-scale fading, realistic EH, and interference constraints. We develop a double deep Q-network (DDQN) enhanced with an upper confidence bound. Simulations benchmark our approach, showing superior performance over existing DRL methods.