🤖 AI Summary
This paper studies the piecewise-stationary multi-armed bandit problem without prior knowledge of change points. To meet the need for a black-box solution accessible to high-school students—requiring no domain expertise—we propose DAB, a modular framework that decouples and composes any static bandit algorithm (e.g., UCB, Thompson Sampling) with a generic change detector (e.g., CUSUM, GLR), enabling adaptive decision-making. Our work delivers the first feasible, general, and prior-free black-box solution, achieving order-optimal dynamic regret $ ilde{mathcal{O}}(sqrt{N_T T})$ under mild assumptions; it is also the first to attain truly optimal dynamic regret in self-adjusting bandits without relying on change-point parameters. The theoretical analysis employs event decomposition and regret decomposition techniques. Empirical evaluations demonstrate consistent superiority over state-of-the-art methods across diverse parametric settings and reveal unexpected generalization to drifting environments.
📝 Abstract
We study the problem of piecewise stationary bandits without prior knowledge of the underlying non-stationarity. We propose the first $ extit{feasible}$ black-box algorithm applicable to most common parametric bandit variants. Our procedure, termed Detection Augmented Bandit (DAB), is modular, accepting any stationary bandit algorithm as input and augmenting it with a change detector. DAB achieves optimal regret in the piecewise stationary setting under mild assumptions. Specifically, we prove that DAB attains the order-optimal regret bound of $ ilde{mathcal{O}}(sqrt{N_T T})$, where $N_T$ denotes the number of changes over the horizon $T$, if its input stationary bandit algorithm has order-optimal stationary regret guarantees. Applying DAB to different parametric bandit settings, we recover recent state-of-the-art results. Notably, for self-concordant bandits, DAB achieves optimal dynamic regret, while previous works obtain suboptimal bounds and require knowledge on the non-stationarity. In simulations on piecewise stationary environments, DAB outperforms existing approaches across varying number of changes. Interestingly, despite being theoretically designed for piecewise stationary environments, DAB is also effective in simulations in drifting environments, outperforming existing methods designed specifically for this scenario.