🤖 AI Summary
To address decentralized online optimization challenges in dynamic Wi-Fi environments—including channel selection, channel-width configuration, and contention window adaptation—this paper proposes E-RLB, a lightweight context-aware multi-armed bandit algorithm. E-RLB innovatively integrates optimistic initialization, unimodal action structure constraints, and a factorized action space to support both joint and decoupled decision-making. It further incorporates contextual features such as channel load and interference intensity to enhance environmental adaptability. Simulation results demonstrate that E-RLB achieves faster convergence and improves steady-state throughput by 12–18% under periodic time-varying conditions, significantly outperforming UCB, Thompson Sampling, and state-of-the-art distributed reinforcement learning baselines. Moreover, its low computational overhead and signaling-free operation make it well-suited for edge deployment in large-scale Wi-Fi networks.
📝 Abstract
The adoption of dynamic, self-learning solutions for real-time wireless network optimization has recently gained significant attention due to the limited adaptability of existing protocols. This paper investigates multi-armed bandit (MAB) strategies as a data-driven approach for decentralized, online channel access optimization in Wi-Fi, targeting dynamic channel access settings: primary channel, channel width, and contention window (CW) adjustment. Key design aspects are examined, including the adoption of joint versus factorial action spaces, the inclusion of contextual information, and the nature of the action-selection strategy (optimism-driven, unimodal, or randomized). State-of-the-art algorithms and a proposed lightweight contextual approach, E-RLB, are evaluated through simulations. Results show that contextual and optimism-driven strategies consistently achieve the highest performance and fastest adaptation under recurrent conditions. Unimodal structures require careful graph construction to ensure that the unimodality assumption holds. Randomized exploration, adopted in the proposed E-RLB, can induce disruptive parameter reallocations, especially in multi-player settings. Decomposing the action space across several specialized agents accelerates convergence but increases sensitivity to randomized exploration and demands coordination under shared rewards to avoid correlated learning. Finally, despite its inherent inefficiencies from epsilon-greedy exploration, E-RLB demonstrates effective adaptation and learning, highlighting its potential as a viable low-complexity solution for realistic dynamic deployments.