Safe Deep Reinforcement Learning for Resource Allocation with Peak Age of Information Violation Guarantees

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

Co-optimizing communication and control in wireless networked control systems (WNCSs) under finite blocklength constraints remains challenging, particularly when jointly minimizing power consumption while satisfying stringent probabilistic constraints on peak age of information (PAoI) violation, transmit power limits, and schedulability. Method: This paper proposes a safety-aware deep reinforcement learning (DRL) framework that explicitly models the PAoI violation probability as a stochastic constraint—first introduced in this context—and provides theoretical guarantees via joint derivation of maximum allowable transmission intervals and end-to-end delays. A teacher–student architecture guides policy learning, while optimization-theoretic principles enforce safety-critical decision-making. Contribution/Results: Experiments demonstrate that the proposed approach significantly improves convergence speed, cumulative reward, and system stability compared to baseline DRL methods, while rigorously respecting all operational constraints under finite blocklength communications.

Technology Category

Application Category

📝 Abstract

In Wireless Networked Control Systems (WNCSs), control and communication systems must be co-designed due to their strong interdependence. This paper presents a novel optimization theory-based safe deep reinforcement learning (DRL) framework for ultra-reliable WNCSs, ensuring constraint satisfaction while optimizing performance, for the first time in the literature. The approach minimizes power consumption under key constraints, including Peak Age of Information (PAoI) violation probability, transmit power, and schedulability in the finite blocklength regime. PAoI violation probability is uniquely derived by combining stochastic maximum allowable transfer interval (MATI) and maximum allowable packet delay (MAD) constraints in a multi-sensor network. The framework consists of two stages: optimization theory and safe DRL. The first stage derives optimality conditions to establish mathematical relationships among variables, simplifying and decomposing the problem. The second stage employs a safe DRL model where a teacher-student framework guides the DRL agent (student). The control mechanism (teacher) evaluates compliance with system constraints and suggests the nearest feasible action when needed. Extensive simulations show that the proposed framework outperforms rule-based and other optimization theory based DRL benchmarks, achieving faster convergence, higher rewards, and greater stability.

Problem

Research questions and friction points this paper is trying to address.

Minimize power consumption under PAoI violation constraints

Ensure constraint satisfaction in ultra-reliable WNCSs

Combine stochastic MATI and MAD constraints uniquely

Innovation

Methods, ideas, or system contributions that make the work stand out.

Safe DRL framework with optimization theory

Teacher-student model for constraint compliance

Minimizes power under PAoI violation constraints

🔎 Similar Papers

Efficient Policy Evaluation with Safety Constraint for Reinforcement Learning