Regret Tail Characterization of Optimal Bandit Algorithms with Generic Rewards

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

250K/year

🤖 AI Summary

This work addresses the lack of sharp characterizations of regret tail behavior for asymptotically optimal multi-armed bandit algorithms under general reward distributions. It extends the KLinf-UCB algorithm to nonparametric classes of reward distributions satisfying mild regularity conditions. By leveraging KL divergence–based upper confidence bounds, nonparametric statistical assumptions, and refined tail probability analysis, the paper provides the first unified characterization of the regret tail behavior for KL-type UCB algorithms in a nonparametric setting, encompassing both bounded-support and heavy-tailed distributions. The derived upper bounds match known theoretical lower bounds in the finite-support case and simultaneously guarantee asymptotic optimality in expectation.

Technology Category

Application Category

📝 Abstract

We study the tail behavior of regret in stochastic multi-armed bandits for algorithms that are asymptotically optimal in expectation. While minimizing expected regret is the classical objective, recent work shows that even such algorithms can exhibit heavy regret tails, incurring large regret with non-negligible probability. Existing sharp characterizations of regret tails are largely restricted to parametric settings, such as single-parameter exponential families. In this work, we extend the $\KLinf$-UCB algorithm of to a broad nonparametric class of reward distributions satisfying mild assumptions, and establish its asymptotic optimality in expectation. We then analyze the tail behavior of its regret and derive a novel upper bound on the regret tail probability. As special cases, our results recover regret-tail guarantees for both bounded-support and heavy-tailed (moment-bounded) bandit models. Moreover, for the special case of finitely-supported reward distributions, our upper bound matches the known lower bound exactly. Our results thus provide a unified and tight characterization of regret tails for asymptotically optimal KL-based UCB algorithms, going beyond parametric models.

Problem

Research questions and friction points this paper is trying to address.

regret tail

multi-armed bandits

asymptotic optimality

nonparametric rewards

stochastic bandits

Innovation

Methods, ideas, or system contributions that make the work stand out.

regret tail

nonparametric bandits

KLinf-UCB