🤖 AI Summary
This paper addresses the problem of adaptively designing sampling policies to minimize the average Age of Information (AoI) in status-updating systems where channel statistics—particularly the delay distribution—are unknown and transmissions are unreliable. Specifically, it considers a dual-uncertainty setting: (i) reliable but randomly delayed feedback at the receiver, and (ii) error-prone uplink channels with unknown delay distributions at the sensor. To this end, the AoI optimization is first formulated as a stochastic approximation problem under unknown delay distributions. A Robbins–Monro-type online algorithm with momentum is proposed, which converges almost surely to the optimal threshold-based policy. Theoretically, the algorithm achieves an $O(ln K)$ upper bound on cumulative AoI regret over $K$ epochs, matching the $Omega(ln K)$ minimax lower bound—establishing order optimality. Extensive simulations demonstrate that the proposed algorithm significantly outperforms baseline strategies across diverse delay distributions.
📝 Abstract
In this paper, we study a system in which a sensor forwards status updates to a receiver through an error-prone channel, while the receiver sends the transmission results back to the sensor via a reliable channel. Both channels are subject to random delays. To evaluate the timeliness of the status information at the receiver, we use the Age of Information (AoI) metric. The objective is to design a sampling policy that minimizes the expected time-average AoI, even when the channel statistics (e.g., delay distributions) are unknown. We first review the threshold structure of the optimal offline policy under known channel statistics and then reformulate the design of the online algorithm as a stochastic approximation problem. We propose a Robbins-Monro algorithm to solve this problem and demonstrate that the optimal threshold can be approximated almost surely. Moreover, we prove that the cumulative AoI regret of the online algorithm increases with rate $mathcal{O}(ln K)$, where $K$ is the number of successful transmissions. In addition, our algorithm is shown to be minimax order optimal, in the sense that for any online learning algorithm, the cumulative AoI regret up to the $K$-th successful transmissions grows with the rate at least $Omega(ln K)$ in the worst case delay distribution. Finally, we improve the stability of the proposed online learning algorithm through a momentum-based stochastic gradient descent algorithm. Simulation results validate the performance of our proposed algorithm.