π€ AI Summary
This work addresses the challenge of real-time sampling and autoregressive Markov source estimation in dynamic yet structurally similar multi-hop wireless networks, where optimal policy design is hindered by high-dimensional action spaces and complex network topologies. The authors propose a graph neural networkβbased multi-agent reinforcement learning framework that leverages a centralized training with decentralized execution paradigm and incorporates recurrent mechanisms to enable decentralized policy optimization aimed at minimizing time-averaged estimation error. The approach facilitates policy transfer across networks of varying scales but similar structures, exhibits robustness in non-stationary environments, and demonstrates performance improvements as the number of agents increases, significantly outperforming existing baselines.
π Abstract
We address real-time sampling and estimation of autoregressive Markovian sources in dynamic yet structurally similar multi-hop wireless networks. Each node caches samples from others and communicates over wireless collision channels, aiming to minimize time-average estimation error via decentralized policies. Due to the high dimensionality of action spaces and complexity of network topologies, deriving optimal policies analytically is intractable. To address this, we propose a graphical multi-agent reinforcement learning framework for policy optimization. Theoretically, we demonstrate that our proposed policies are transferable, allowing a policy trained on one graph to be effectively applied to structurally similar graphs. Numerical experiments demonstrate that (i) our proposed policy outperforms state-of-the-art baselines; (ii) the trained policies are transferable to larger networks, with performance gains increasing with the number of agents; (iii) the graphical training procedure withstands non-stationarity, even when using independent learning techniques; and (iv) recurrence is pivotal in both independent learning and centralized training and decentralized execution, and improves the resilience to non-stationarity.