🤖 AI Summary
This work addresses the limitations of age-centric status updating in wireless control systems operating under energy constraints and unreliable channels, where minimizing information age does not necessarily enhance dynamic performance. The authors model status updates as a coupon-collector problem with an expiration mechanism and formulate a two-dimensional average-reward Markov decision process to jointly optimize the sender’s information residual lifetime and the receiver’s freshness timer. Key contributions include a dual-threshold optimal scheduling structure that achieves Level-C effectiveness without prior knowledge of channel success probabilities or lifetime distributions, a closed-form deterministic lifetime policy, and a structure-aware Q-learning (SAQ) algorithm that converges faster than standard Q-learning while matching the performance of value iteration. Experiments demonstrate up to a 50% improvement in system reward over age-based baselines, substantially enhancing control efficacy under resource constraints.
📝 Abstract
For status update systems operating over unreliable energy-constrained wireless channels, we address Weaver's long-standing Level-C question: do my packets actually improve the plant's behavior? Each fresh sample carries a stochastic expiration time -- governed by the plant's instability dynamics -- after which the information becomes useless for control. Casting the problem as a coupon-collector variant with expiring coupons, we (i) formulate a two-dimensional average-reward MDP, (ii) prove that the optimal schedule is doubly thresholded in the receiver's freshness timer and the sender's stored lifetime, (iii) derive a closed-form policy for deterministic lifetimes, and (iv) design a Structure-Aware Q-learning algorithm (SAQ) that learns the optimal policy without knowing the channel success probability or lifetime distribution. Simulations validate our theoretical predictions: SAQ matches optimal Value Iteration performance while converging significantly faster than baseline Q-learning, and expiration-aware scheduling achieves up to 50% higher reward than age-based baselines by adapting transmissions to state-dependent urgency -- thereby delivering Level-C effectiveness under tight resource constraints.