A Context Augmented Multi-Play Multi-Armed Bandit Algorithm for Fast Channel Allocation in Opportunistic Spectrum Access

📅 2026-05-24

📈 Citations: 0

✨ Influential: 0

career value

272K/year

🤖 AI Summary

This work addresses the limitations of existing multi-player multi-armed bandit (MP-MAB) approaches in opportunistic spectrum access, which often neglect channel noise, incur high computational overhead, and exhibit poor practicality. To overcome these issues, the paper explicitly models channel noise as reward perturbation and leverages channel state information as contextual input. It proposes two novel context-aware upper confidence bound (UCB) index policies by learning the relationship between context and noise—using linear regression in one and neural networks in the other. Experimental results demonstrate that the proposed methods significantly reduce cumulative regret, effectively avoid suboptimal channel selections, and thereby enhance both the fairness of spectrum allocation and overall quality of service.

📝 Abstract

We study the restless contextual multi-play multi-armed bandit (MP-MAB) problem for channel allocation in the opportunity spectrum access (OSA) scenario. Most existing MP-MAB methods are impractical for real-world OSA systems as they assume many ideal conditions, incur a heavy computational cost, and most importantly, ignore the impact of channel noise which is directly related to the quality of service. In this study, we embody this impact by modeling channel noise as a perturbation of the arm's reward function in MP-MAB. As there is an implicit correlation between channel state information and channel noise, we take the former as a context for MP-MAB to present the perturbation caused by the latter. We investigate two types of correlation between the context and the perturbation -- linear and nonlinear, and derive two index policies, respectively. These policies learn the correlations through a linear model and a neural network, and use estimated noise value to adjust the upper confidence bound. Numerical experiments demonstrate that the proposed policies can achieve lower regret and select sub-optimal arms in a more reasonable way.

Problem

Research questions and friction points this paper is trying to address.

opportunistic spectrum access

multi-play multi-armed bandit

channel noise

contextual bandit

channel allocation

Innovation

Methods, ideas, or system contributions that make the work stand out.

contextual multi-play multi-armed bandit

opportunistic spectrum access

channel noise modeling