Catoni-Style Change Point Detection for Regret Minimization in Non-Stationary Heavy-Tailed Bandits

πŸ“… 2025-05-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses regret minimization in non-stationary heavy-tailed multi-armed bandits, where rewards follow heavy-tailed distributions and their means undergo abrupt changes at unknown time points in a piecewise-stationary environment. To tackle this challenge, we propose Robust-CPD-UCBβ€”a novel UCB-type algorithm that integrates Catoni’s robust estimator into sequential change-point detection, specifically tailored for heavy-tailed settings. We establish the first UCB framework applicable to this setting and derive a tight regret upper bound of $O(sqrt{TV_T K log T})$, where $T$ is the horizon, $K$ the number of arms, and $V_T$ the total variation budget (i.e., cumulative change intensity). We prove its theoretical optimality by providing a matching information-theoretic lower bound. Empirical evaluation on synthetic data as well as real-world financial and communication datasets demonstrates significant improvements over existing methods designed for light-tailed or non-adaptive non-stationary environments.

Technology Category

Application Category

πŸ“ Abstract
Regret minimization in stochastic non-stationary bandits gained popularity over the last decade, as it can model a broad class of real-world problems, from advertising to recommendation systems. Existing literature relies on various assumptions about the reward-generating process, such as Bernoulli or subgaussian rewards. However, in settings such as finance and telecommunications, heavy-tailed distributions naturally arise. In this work, we tackle the heavy-tailed piecewise-stationary bandit problem. Heavy-tailed bandits, introduced by Bubeck et al., 2013, operate on the minimal assumption that the finite absolute centered moments of maximum order $1+epsilon$ are uniformly bounded by a constant $v<+infty$, for some $epsilon in (0,1]$. We focus on the most popular non-stationary bandit setting, i.e., the piecewise-stationary setting, in which the mean of reward-generating distributions may change at unknown time steps. We provide a novel Catoni-style change-point detection strategy tailored for heavy-tailed distributions that relies on recent advancements in the theory of sequential estimation, which is of independent interest. We introduce Robust-CPD-UCB, which combines this change-point detection strategy with optimistic algorithms for bandits, providing its regret upper bound and an impossibility result on the minimum attainable regret for any policy. Finally, we validate our approach through numerical experiments on synthetic and real-world datasets.
Problem

Research questions and friction points this paper is trying to address.

Detecting change points in heavy-tailed bandit rewards
Minimizing regret in non-stationary heavy-tailed bandits
Developing robust algorithms for piecewise-stationary reward distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Catoni-style change-point detection for heavy tails
Robust-CPD-UCB combines detection with UCB
Handles piecewise-stationary heavy-tailed bandits
πŸ”Ž Similar Papers
No similar papers found.