uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs

📅 2024-10-04

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This paper studies the heavy-tailed multi-armed bandit (HTMAB) problem, where rewards follow heavy-tailed distributions with unknown scale parameter σ and tail index α. The goal is to design a universal robust algorithm that requires no prior knowledge of the environment type (stochastic or adversarial) nor of the heavy-tailed parameters. To this end, we propose the first algorithm achieving both Best-of-Both-Worlds (BoBW) guarantees and full parameter-freeness. Its core innovations include: (i) adaptive skip-and-truncate loss regularization, (ii) a dynamic analysis framework based on log-barrier potential functions, and (iii) an automatically balanced time-varying learning rate schedule. We prove that the algorithm achieves near-optimal regret bounds—up to logarithmic factors—in both stochastic and adversarial heavy-tailed settings, matching known lower bounds. This is the first HTMAB solution simultaneously attaining BoBW performance and complete parameter independence.

Technology Category

Application Category

📝 Abstract

In this paper, we present a novel algorithm, uniINF, for the Heavy-Tailed Multi-Armed Bandits (HTMAB) problem, demonstrating robustness and adaptability in both stochastic and adversarial environments. Unlike the stochastic MAB setting where loss distributions are stationary with time, our study extends to the adversarial setup, where losses are generated from heavy-tailed distributions that depend on both arms and time. Our novel algorithm `uniINF` enjoys the so-called Best-of-Both-Worlds (BoBW) property, performing optimally in both stochastic and adversarial environments without knowing the exact environment type. Moreover, our algorithm also possesses a Parameter-Free feature, i.e., it operates without the need of knowing the heavy-tail parameters $(sigma, alpha)$ a-priori. To be precise, uniINF ensures nearly-optimal regret in both stochastic and adversarial environments, matching the corresponding lower bounds when $(sigma, alpha)$ is known (up to logarithmic factors). To our knowledge, uniINF is the first parameter-free algorithm to achieve the BoBW property for the heavy-tailed MAB problem. Technically, we develop innovative techniques to achieve BoBW guarantees for Parameter-Free HTMABs, including a refined analysis for the dynamics of log-barrier, an auto-balancing learning rate scheduling scheme, an adaptive skipping-clipping loss tuning technique, and a stopping-time analysis for logarithmic regret.

Problem

Research questions and friction points this paper is trying to address.

Develops uniINF for Heavy-Tailed Multi-Armed Bandits

Ensures optimal performance in stochastic/adversarial environments

Operates without prior heavy-tail parameters knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Best-of-Both-Worlds algorithm

Parameter-Free operation

Heavy-Tailed MABs adaptation

🔎 Similar Papers

The Extended UCB Policies for Frequentist Multi-armed Bandit Problems

2011-12-08Citations: 1

Amazon

Arlington, VA, USA / Bellevue, WA, USA / Boston, MA, USA

Research Engineer, Monetization AI