🤖 AI Summary
This work studies the constrained contextual bandit problem in an adversarial setting, where each action yields stochastic rewards and costs, and the goal is to simultaneously control cumulative regret and constraint violations. To this end, the authors propose a modular algorithmic framework built upon SquareCB, which for the first time effectively extends regression-based reduction techniques to the adversarial contextual setting. By leveraging an online regression oracle and an adaptive surrogate reward function, the original constrained problem is reduced to a standard unconstrained contextual bandit problem. Under realizability assumptions on the reward and cost function classes, the method achieves tight simultaneous control over both regret and constraint violation, offering theoretical guarantees that improve upon existing approaches.
📝 Abstract
We study constrained contextual bandits (CCB) with adversarially chosen contexts, where each action yields a random reward and incurs a random cost. We adopt the standard realizability assumption: conditioned on the observed context, rewards and costs are drawn independently from fixed distributions whose expectations belong to known function classes. We consider the continuing setting, in which the algorithm operates over the entire horizon even after the budget is exhausted. In this setting, the objective is to simultaneously control regret and cumulative constraint violation. Building on the seminal SquareCB framework of Foster et al. (2018), we propose a simple and modular algorithmic scheme that leverages online regression oracles to reduce the constrained problem to a standard unconstrained contextual bandit problem with adaptively defined surrogate reward functions. In contrast to most prior work on CCB, which focuses on stochastic contexts, our reduction yields improved guarantees for the more general adversarial context setting, together with a compact and transparent analysis.