π€ AI Summary
In dense Wi-Fi 8 (IEEE 802.11bn) deployments, coordinated spatial reuse (C-SR) among multiple access points (APs) faces the challenge of dynamic group selection under time-varying interference. To address this, we propose a Hierarchical Multi-Armed Bandit (HMAB) frameworkβthe first to introduce hierarchical reinforcement learning into C-SR control. Our approach features a lightweight Upper Confidence Bound (UCB)-based online learning mechanism that enables adaptive concurrent transmission scheduling while ensuring rapid convergence (40% faster), topology robustness, and long-term performance stability. Experiments across diverse, dynamically interfering topologies demonstrate significant gains in spectral efficiency and throughput. Crucially, the method sustains over 92% of optimal scheduling performance consistently, offering a scalable, low-overhead, distributed intelligent solution for C-SR in high-density wireless networks.
π Abstract
Coordination among multiple access points (APs) is integral to IEEE 802.11bn (Wi-Fi 8) for managing contention in dense networks. This letter explores the benefits of Coordinated Spatial Reuse (C-SR) and proposes the use of reinforcement learning to optimize C-SR group selection. We develop a hierarchical multi-armed bandit (MAB) framework that efficiently selects APs for simultaneous transmissions across various network topologies, demonstrating reinforcement learning's promise in Wi-Fi settings. Among several MAB algorithms studied, we identify the upper confidence bound (UCB) as particularly effective, offering rapid convergence, adaptability to changes, and sustained performance.