Catoni-Style Change Point Detection for Regret Minimization in Non-Stationary Heavy-Tailed Bandits

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This paper addresses regret minimization in non-stationary heavy-tailed multi-armed bandits, where rewards follow heavy-tailed distributions and their means undergo abrupt changes at unknown time points in a piecewise-stationary environment. To tackle this challenge, we propose Robust-CPD-UCB—a novel UCB-type algorithm that integrates Catoni’s robust estimator into sequential change-point detection, specifically tailored for heavy-tailed settings. We establish the first UCB framework applicable to this setting and derive a tight regret upper bound of $O(sqrt{TV_T K log T})$, where $T$ is the horizon, $K$ the number of arms, and $V_T$ the total variation budget (i.e., cumulative change intensity). We prove its theoretical optimality by providing a matching information-theoretic lower bound. Empirical evaluation on synthetic data as well as real-world financial and communication datasets demonstrates significant improvements over existing methods designed for light-tailed or non-adaptive non-stationary environments.

Technology Category

Application Category

📝 Abstract

Regret minimization in stochastic non-stationary bandits gained popularity over the last decade, as it can model a broad class of real-world problems, from advertising to recommendation systems. Existing literature relies on various assumptions about the reward-generating process, such as Bernoulli or subgaussian rewards. However, in settings such as finance and telecommunications, heavy-tailed distributions naturally arise. In this work, we tackle the heavy-tailed piecewise-stationary bandit problem. Heavy-tailed bandits, introduced by Bubeck et al., 2013, operate on the minimal assumption that the finite absolute centered moments of maximum order $1+epsilon$ are uniformly bounded by a constant $v<+infty$, for some $epsilon in (0,1]$. We focus on the most popular non-stationary bandit setting, i.e., the piecewise-stationary setting, in which the mean of reward-generating distributions may change at unknown time steps. We provide a novel Catoni-style change-point detection strategy tailored for heavy-tailed distributions that relies on recent advancements in the theory of sequential estimation, which is of independent interest. We introduce Robust-CPD-UCB, which combines this change-point detection strategy with optimistic algorithms for bandits, providing its regret upper bound and an impossibility result on the minimum attainable regret for any policy. Finally, we validate our approach through numerical experiments on synthetic and real-world datasets.

Problem

Research questions and friction points this paper is trying to address.

Detecting change points in heavy-tailed bandit rewards

Minimizing regret in non-stationary heavy-tailed bandits

Developing robust algorithms for piecewise-stationary reward distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Catoni-style change-point detection for heavy tails

Robust-CPD-UCB combines detection with UCB

Handles piecewise-stationary heavy-tailed bandits

🔎 Similar Papers

Batched Nonparametric Contextual Bandits

2024-02-27IEEE Transactions on Information TheoryCitations: 1

Amazon

Arlington, VA, USA / Bellevue, WA, USA / Boston, MA, USA

AI Research Scientist, CoreML - Monetization