🤖 AI Summary
This work addresses the challenge of wireless remote state estimation under random sensor-to-estimator delays, which degrade information utility. While conventional approaches focus solely on information freshness—typically measured by Age of Information (AoI)—they overlook the intricate coupling among delay, information content, and energy efficiency. To bridge this gap, the authors propose a unified delay-aware framework that incorporates delayed measurements via posterior-fusion Kalman updates, formulates scheduling as a Markov decision process, and introduces a proximal policy optimization (PPO)-based reinforcement learning scheduler. This scheduler jointly optimizes information gain and energy consumption without requiring prior knowledge of the delay distribution. The study further establishes, for the first time, an explicit characterization of the dependency between delay and information gain, yielding a tractable stability condition that guarantees bounded estimation error. Experiments demonstrate that the proposed method significantly outperforms random scheduling and baseline RL algorithms such as DQN and A2C under heterogeneous sensors, realistic link energy costs, and stochastic delays, achieving substantially lower estimation error at comparable energy expenditure while remaining robust to variations in measurement availability and noise.
📝 Abstract
Unpredictable sensor-to-estimator delays fundamentally distort what matters for wireless remote state estimation: not just freshness, but how delay interacts with sensor informativeness and energy efficiency. In this paper, we present a unified, delay-aware framework that models this coupling explicitly and quantifies a delay-dependent information gain, motivating an information-per-joule scheduling objective beyond age of information proxies (AoI). To this end, we first introduce an efficient posterior-fusion update that incorporates delayed measurements without state augmentation, providing a consistent approximation to optimal delayed Kalman updates, and then derive tractable stability conditions ensuring that bounded estimation error is achievable under stochastic, delayed scheduling. This conditions highlight the need for unstable modes to be observable across sensors. Building on this foundation, we cast scheduling as a Markov decision process and develop a proximal policy optimization (PPO) scheduler that learns directly from interaction, requires no prior delay model, and explicitly trades off estimation accuracy, freshness, sensor heterogeneity, and transmission energy through normalized rewards. In simulations with heterogeneous sensors, realistic link-energy models, and random delays, the proposed method learns stably and consistently achieves lower estimation error at comparable energy than random scheduling and strong RL baselines (DQN, A2C), while remaining robust to variations in measurement availability and process/measurement noise.