uniINF: Best-of-Both-Worlds Algorithm for Parameter-Free Heavy-Tailed MABs

📅 2024-10-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the heavy-tailed multi-armed bandit (HTMAB) problem, where rewards follow heavy-tailed distributions with unknown scale parameter σ and tail index α. The goal is to design a universal robust algorithm that requires no prior knowledge of the environment type (stochastic or adversarial) nor of the heavy-tailed parameters. To this end, we propose the first algorithm achieving both Best-of-Both-Worlds (BoBW) guarantees and full parameter-freeness. Its core innovations include: (i) adaptive skip-and-truncate loss regularization, (ii) a dynamic analysis framework based on log-barrier potential functions, and (iii) an automatically balanced time-varying learning rate schedule. We prove that the algorithm achieves near-optimal regret bounds—up to logarithmic factors—in both stochastic and adversarial heavy-tailed settings, matching known lower bounds. This is the first HTMAB solution simultaneously attaining BoBW performance and complete parameter independence.

Technology Category

Application Category

📝 Abstract
In this paper, we present a novel algorithm, uniINF, for the Heavy-Tailed Multi-Armed Bandits (HTMAB) problem, demonstrating robustness and adaptability in both stochastic and adversarial environments. Unlike the stochastic MAB setting where loss distributions are stationary with time, our study extends to the adversarial setup, where losses are generated from heavy-tailed distributions that depend on both arms and time. Our novel algorithm `uniINF` enjoys the so-called Best-of-Both-Worlds (BoBW) property, performing optimally in both stochastic and adversarial environments without knowing the exact environment type. Moreover, our algorithm also possesses a Parameter-Free feature, i.e., it operates without the need of knowing the heavy-tail parameters $(sigma, alpha)$ a-priori. To be precise, uniINF ensures nearly-optimal regret in both stochastic and adversarial environments, matching the corresponding lower bounds when $(sigma, alpha)$ is known (up to logarithmic factors). To our knowledge, uniINF is the first parameter-free algorithm to achieve the BoBW property for the heavy-tailed MAB problem. Technically, we develop innovative techniques to achieve BoBW guarantees for Parameter-Free HTMABs, including a refined analysis for the dynamics of log-barrier, an auto-balancing learning rate scheduling scheme, an adaptive skipping-clipping loss tuning technique, and a stopping-time analysis for logarithmic regret.
Problem

Research questions and friction points this paper is trying to address.

Develops uniINF for Heavy-Tailed Multi-Armed Bandits
Ensures optimal performance in stochastic/adversarial environments
Operates without prior heavy-tail parameters knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Best-of-Both-Worlds algorithm
Parameter-Free operation
Heavy-Tailed MABs adaptation
🔎 Similar Papers
No similar papers found.