🤖 AI Summary
This paper addresses the challenge of ensuring closed-loop stability for multiple plants in spectrum-constrained, fully distributed wireless networked control systems (WNCSs). Method: We jointly optimize uplink and downlink scheduling by deriving a sufficient stability condition based on stochastic system theory and formulating a Markov decision process (MDP) with a finite-dimensional state representation. To tackle the prohibitively large action space, we propose a general action-space reduction technique coupled with action embedding, compatible with deep reinforcement learning algorithms including DQN, DDPG, and TD3. Contribution/Results: Experiments demonstrate that the proposed approach significantly outperforms conventional baseline policies in both control performance—e.g., reduced steady-state error and lower instability probability—and communication efficiency—e.g., improved channel utilization and reduced scheduling overhead—thereby validating the feasibility and practicality of stable, efficient joint scheduling in dynamic WNCSs.
📝 Abstract
We consider a joint uplink and downlink scheduling problem of a fully distributed wireless networked control system (WNCS) with a limited number of frequency channels. Using elements of stochastic systems theory, we derive a sufficient stability condition of the WNCS, which is stated in terms of both the control and communication system parameters. Once the condition is satisfied, there exists a stationary and deterministic scheduling policy that can stabilize all plants of the WNCS. By analyzing and representing the per-step cost function of the WNCS in terms of a finite-length countable vector state, we formulate the optimal transmission scheduling problem into a Markov decision process and develop a deep reinforcement learning (DRL) based framework for solving it. To tackle the challenges of a large action space in DRL, we propose novel action space reduction and action embedding methods for the DRL framework that can be applied to various algorithms, including Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed Deep Deterministic Policy Gradient (TD3). Numerical results show that the proposed algorithm significantly outperforms benchmark policies.