On the Low-Complexity of Fair Learning for Combinatorial Multi-Armed Bandit

📅 2025-01-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the Combinatorial Multi-Armed Bandit problem with Fairness constraints (CMAB-F), tackling the challenge of exponential blowup in the super-arm space caused by interference in wireless networks, which hinders real-time joint optimization of fairness and reward. We propose a low-complexity fair learning algorithm based on a “pick-and-compare” paradigm, integrating virtual queueing, upper confidence bound (UCB) estimation, and stochastic sampling-based comparison—reducing super-arm enumeration from exponential to constant-time random sampling. Theoretically, the algorithm achieves an $O(sqrt{T})$ cumulative regret bound while ensuring asymptotic fairness. Simulations under typical interference-constrained settings demonstrate over 90% reduction in computational overhead, with negligible degradation in both fairness and regret performance. Our key contribution is the first framework that jointly optimizes fairness, sample efficiency, and theoretical guarantees in high-dimensional combinatorial decision-making.

Technology Category

Application Category

📝 Abstract
Combinatorial Multi-Armed Bandit with fairness constraints is a framework where multiple arms form a super arm and can be pulled in each round under uncertainty to maximize cumulative rewards while ensuring the minimum average reward required by each arm. The existing pessimistic-optimistic algorithm linearly combines virtual queue-lengths (tracking the fairness violations) and Upper Confidence Bound estimates as a weight for each arm and selects a super arm with the maximum total weight. The number of super arms could be exponential to the number of arms in many scenarios. In wireless networks, interference constraints can cause the number of super arms to grow exponentially with the number of arms. Evaluating all the feasible super arms to find the one with the maximum total weight can incur extremely high computational complexity in the pessimistic-optimistic algorithm. To avoid this, we develop a low-complexity fair learning algorithm based on the so-called pick-and-compare approach that involves randomly picking $M$ feasible super arms to evaluate. By setting $M$ to a constant, the number of comparison steps in the pessimistic-optimistic algorithm can be reduced to a constant, thereby significantly reducing the computational complexity. Our theoretical proof shows this low-complexity design incurs only a slight sacrifice in fairness and regret performance. Finally, we validate the theoretical result by extensive simulations.
Problem

Research questions and friction points this paper is trying to address.

Multi-armed Bandit
Fairness
Wireless Networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient Strategy Combinations
Randomized Algorithm
Wireless Network Optimization
🔎 Similar Papers
No similar papers found.
Xiaoyi Wu
Xiaoyi Wu
The Pennsylvania State University
Multi-Armed BanditVideo StreamingLLM
B
Bo Ji
Department of Computer Science, Virginia Tech, Blacksburg, Virginia, USA
B
Bin Li
Department of Electrical Engineering, Pennsylvania State University, University Park, PA, USA