🤖 AI Summary
This paper studies the Bandits with Adaptive Knapsacks (BwAK) problem—a multi-armed bandit setting where, in each round, the cumulative resource cost must satisfy a dynamically evolving budget constraint, departing from conventional fixed global budget assumptions. We introduce the novel “anytime cost constraint” modeling framework and propose SUAK, an adaptive algorithm that integrates budget-aware UCB exploration with proactive load reduction to ensure feasibility without skipping rounds. We establish a problem-dependent regret upper bound of $O(K log T)$, matching the optimal rate for classical Bandits with Knapsacks (BwK). Extensive simulations confirm SUAK’s superior performance under tight, time-varying budget constraints.
📝 Abstract
We consider bandits with anytime knapsacks (BwAK), a novel version of the BwK problem where there is an extit{anytime} cost constraint instead of a total cost budget. This problem setting introduces additional complexities as it mandates adherence to the constraint throughout the decision-making process. We propose SUAK, an algorithm that utilizes upper confidence bounds to identify the optimal mixture of arms while maintaining a balance between exploration and exploitation. SUAK is an adaptive algorithm that strategically utilizes the available budget in each round in the decision-making process and skips a round when it is possible to violate the anytime cost constraint. In particular, SUAK slightly under-utilizes the available cost budget to reduce the need for skipping rounds. We show that SUAK attains the same problem-dependent regret upper bound of $ O(K log T)$ established in prior work under the simpler BwK framework. Finally, we provide simulations to verify the utility of SUAK in practical settings.