Replicable Constrained Bandits

📅 2026-02-16

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenge of algorithmic reproducibility in constrained multi-armed bandits (Constrained MABs), where the goal is to maximize reward while satisfying multiple constraints and ensuring that repeated runs under identical conditions yield consistent decisions with high probability. We introduce reproducibility into this setting for the first time, proposing a reproducible UCB-type algorithm grounded in the Optimism in the Face of Uncertainty (OFU) principle, complemented by a carefully designed randomness control mechanism to guarantee high-probability decision consistency. Theoretical analysis demonstrates that the proposed algorithm achieves optimal-order bounds—matching those of non-reproducible baselines—for both cumulative regret and constraint violation over time horizon $T$, thereby challenging the conventional belief that randomness and reproducibility are inherently incompatible.

Technology Category

Application Category

📝 Abstract

Algorithmic \emph{replicability} has recently been introduced to address the need for reproducible experiments in machine learning. A \emph{replicable online learning} algorithm is one that takes the same sequence of decisions across different executions in the same environment, with high probability. We initiate the study of algorithmic replicability in \emph{constrained} MAB problems, where a learner interacts with an unknown stochastic environment for $T$ rounds, seeking not only to maximize reward but also to satisfy multiple constraints. Our main result is that replicability can be achieved in constrained MABs. Specifically, we design replicable algorithms whose regret and constraint violation match those of non-replicable ones in terms of $T$. As a key step toward these guarantees, we develop the first replicable UCB-like algorithm for \emph{unconstrained} MABs, showing that algorithms that employ the optimism in-the-face-of-uncertainty principle can be replicable, a result that we believe is of independent interest.

Problem

Research questions and friction points this paper is trying to address.

replicability

constrained MAB

online learning

stochastic environment

algorithmic reproducibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

replicable online learning

constrained multi-armed bandits

UCB algorithm