Hierarchical Deep Reinforcement Learning for Robust Access in Cognitive IoT Networks under Smart Jamming Attacks

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Cognitive Internet of Things (IoT) networks face challenges in enabling energy-constrained secondary users (SUs) to perform dynamic spectrum access under adversarial smart jammer attacks. Method: This paper proposes a reinforcement learning (RL) framework jointly optimizing transmission/energy-harvesting mode selection, channel assignment, and continuous power control. To address the hybrid discrete-continuous action space and multiple constraints—including energy causality, interference thresholds, and adversarial jamming—we introduce a novel three-layer hierarchical Deep Deterministic Policy Gradient (H-DDPG) architecture that decouples decision-making hierarchies. Furthermore, we model the smart jammer as an adaptive RL-based adversary, formulating a multi-constrained Markov decision process (MDP) capturing the attack-defense interaction. Results: Simulation results demonstrate that the proposed method reduces jamming-induced communication outage rate by 32% compared to conventional flat RL approaches, while significantly improving throughput and energy efficiency—achieving both robust anti-jamming capability and effective resource utilization.

Technology Category

Application Category

📝 Abstract

In this paper, we address the challenge of dynamic spectrum access in a cognitive Internet of Things (CIoT) network where a secondary user (SU) operates under both energy constraints and adversarial interference from a smart jammer. The SU coexists with primary users (PUs) and must ensure that its transmissions do not exceed a predefined interference threshold on licensed channels. At each time slot, the SU must jointly determine whether to transmit or harvest energy, which channel to access, and the appropriate transmit power while satisfying energy and interference constraints. Meanwhile, a smart jammer actively selects a channel to disrupt, aiming to degrade the SU's communication performance. This setting presents a significant challenge due to its multi-level decision structure and hybrid action space, which combines both discrete and continuous decisions. To tackle this, we propose a novel Hierarchical Deep Deterministic Policy Gradient (H-DDPG) framework that decomposes the decision-making process into three levels: the high-level policy determines the mode (transmit or harvest), the mid-level policy selects the channel, and the low-level actor outputs a continuous power level. Concurrently, the jammer is modeled as a reinforcement learning agent that learns an adaptive channel jamming strategy using a discrete variant of DDPG. Simulation results show that our H-DDPG approach outperforms conventional flat reinforcement learning baselines.

Problem

Research questions and friction points this paper is trying to address.

Dynamic spectrum access under energy constraints and smart jamming

Joint decision-making for transmission, channel selection, and power control

Hierarchical reinforcement learning to manage hybrid discrete-continuous actions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical DDPG framework for multi-level decisions

Decomposes actions into mode, channel, and power levels

Models jammer as adaptive RL agent using DDPG variant

🔎 Similar Papers

Artificial Intelligence for Secured Information Systems in Smart Cities: Collaborative IoT Computing with Deep Reinforcement Learning and Blockchain