Achieving adaptivity and optimality for multi-armed bandits using Exponential-Kullback Leiblier Maillard Sampling

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This paper addresses the single-parameter exponential-family multi-armed bandit problem and proposes ExpKLMS—the first algorithm simultaneously achieving multiple theoretical optimality criteria. ExpKLMS leverages exact KL-divergence modeling, Maillard-type confidence upper bounds, and variance-aware sampling. It attains asymptotic optimality (A.O.), minimax optimality up to logarithmic factors (M.O.), the Sub-UCB property, and a variance-adaptive worst-case regret bound of $O(sqrt{T cdot mathrm{Var}})$. All guarantees are rigorously established within the OPED (Optimism-Pessimism-Exploration-Dependence) framework, which unifies the analysis and demonstrates ExpKLMS’s strict superiority over classical Thompson Sampling (TS) and UCB-based algorithms. By reconciling previously conflicting objectives—such as asymptotic efficiency, finite-time robustness, and variance adaptation—ExpKLMS breaks the longstanding trade-off bottleneck inherent in existing bandit methodologies.

Technology Category

Application Category

📝 Abstract

We study the problem of Multi-Armed Bandits (MAB) with reward distributions belonging to a One-Parameter Exponential Distribution (OPED) family. In the literature, several criteria have been proposed to evaluate the performance of such algorithms, including Asymptotic Optimality (A.O.), Minimax Optimality (M.O.), Sub-UCB, and variance-adaptive worst-case regret bound. Thompson Sampling (TS)-based and Upper Confidence Bound (UCB)-based algorithms have been employed to achieve some of these criteria. However, none of these algorithms simultaneously satisfy all the aforementioned criteria. In this paper, we design an algorithm, Exponential Kullback-Leibler Maillard Sampling (abbrev. expklms), that can achieve multiple optimality criteria simultaneously, including A.O., M.O. with a logarithmic factor, Sub-UCB, and variance-adaptive worst-case regret bound.

Problem

Research questions and friction points this paper is trying to address.

Multi-Armed Bandits optimization

Exponential-Kullback Leiblier Maillard Sampling

Simultaneous optimality criteria achievement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Exponential-Kullback-Leiblier Maillard Sampling

Simultaneous optimality criteria achievement

Variance-adaptive worst-case regret bound

🔎 Similar Papers

No similar papers found.