🤖 AI Summary
Cognitive Internet of Things (IoT) networks face challenges in enabling energy-constrained secondary users (SUs) to perform dynamic spectrum access under adversarial smart jammer attacks. Method: This paper proposes a reinforcement learning (RL) framework jointly optimizing transmission/energy-harvesting mode selection, channel assignment, and continuous power control. To address the hybrid discrete-continuous action space and multiple constraints—including energy causality, interference thresholds, and adversarial jamming—we introduce a novel three-layer hierarchical Deep Deterministic Policy Gradient (H-DDPG) architecture that decouples decision-making hierarchies. Furthermore, we model the smart jammer as an adaptive RL-based adversary, formulating a multi-constrained Markov decision process (MDP) capturing the attack-defense interaction. Results: Simulation results demonstrate that the proposed method reduces jamming-induced communication outage rate by 32% compared to conventional flat RL approaches, while significantly improving throughput and energy efficiency—achieving both robust anti-jamming capability and effective resource utilization.
📝 Abstract
In this paper, we address the challenge of dynamic spectrum access in a cognitive Internet of Things (CIoT) network where a secondary user (SU) operates under both energy constraints and adversarial interference from a smart jammer. The SU coexists with primary users (PUs) and must ensure that its transmissions do not exceed a predefined interference threshold on licensed channels. At each time slot, the SU must jointly determine whether to transmit or harvest energy, which channel to access, and the appropriate transmit power while satisfying energy and interference constraints. Meanwhile, a smart jammer actively selects a channel to disrupt, aiming to degrade the SU's communication performance. This setting presents a significant challenge due to its multi-level decision structure and hybrid action space, which combines both discrete and continuous decisions. To tackle this, we propose a novel Hierarchical Deep Deterministic Policy Gradient (H-DDPG) framework that decomposes the decision-making process into three levels: the high-level policy determines the mode (transmit or harvest), the mid-level policy selects the channel, and the low-level actor outputs a continuous power level. Concurrently, the jammer is modeled as a reinforcement learning agent that learns an adaptive channel jamming strategy using a discrete variant of DDPG. Simulation results show that our H-DDPG approach outperforms conventional flat reinforcement learning baselines.