On The Complexity of Best-Arm Identification in Non-Stationary Linear Bandits

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the problem of best-arm identification under a fixed budget in non-stationary linear bandits, where the unknown parameter vector may change adversarially over time, and the goal is to identify, with high probability, the arm that maximizes cumulative reward over the entire horizon. The authors establish a lower bound on the error probability that explicitly depends on the geometric structure of the arm set. To match this bound, they introduce Adjacent-optimal design—a specialized instance of G-optimal design—and propose the corresponding Adjacent-BAI algorithm. By employing an adaptive sampling strategy tailored to non-stationary environments, Adjacent-BAI achieves an error probability that tightly matches the derived lower bound, thereby providing the first sharp characterization of the problem’s complexity in terms of the arm set geometry.

Technology Category

Application Category

📝 Abstract

We study the fixed-budget best-arm identification (BAI) problem in non-stationary linear bandits. Concretely, given a fixed time budget $T\in \mathbb{N}$, finite arm set $\mathcal{X} \subset \mathbb{R}^d$, and a potentially adversarial sequence of unknown parameters $\lbrace \theta_t\rbrace_{t=1}^{T}$ (hence non-stationary), a learner aims to identify the arm with the largest cumulative reward $x_* = \arg\max_{x \in \mathcal{X}} x^\top\sum_{t=1}^T \theta_t$ with high probability. In this setting, it is well-known that uniformly sampling arms from the G-optimal design yields a minimax-optimal error probability of $\exp\left(-\Theta\left(T / H_{G}\right)\right)$, where $H_{G}$ scales proportionally with the dimension $d$. However, this notion of complexity is overly pessimistic, as it is derived from a lower bound in which the arm set consists only of the standard basis vectors, thus masking any potential advantages arising from arm sets with richer geometric structure. To address this, we establish an arm-set-dependent lower bound that, in contrast, holds for any arm set. Motivated by the ideas underlying our lower bound, we propose the Adjacent-optimal design, a specialization of the well-known $\mathcal{X}\mathcal{Y}$-optimal design, and develop the $\textsf{Adjacent-BAI}$ algorithm. We prove that the error probability of $\textsf{Adjacent-BAI}$ matches our lower bound up to constants, verifying the tightness of our lower bound, and establishing the arm-set-dependent complexity of this setting.

Problem

Research questions and friction points this paper is trying to address.

best-arm identification

non-stationary linear bandits

fixed-budget

arm-set-dependent complexity

cumulative reward

Innovation

Methods, ideas, or system contributions that make the work stand out.

best-arm identification

non-stationary linear bandits

arm-set-dependent complexity