Precise Asymptotics and Refined Regret of Variance-Aware UCB

📅 2024-12-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper investigates the asymptotic behavior and regret performance of the variance-aware UCB-V algorithm for the multi-armed bandit (MAB) problem. Addressing the lack of precise characterization of arm-selection frequencies and uncertainty regarding convergence, we establish, for the first time, an exact asymptotic characterization of UCB-V’s arm-selection rates—revealing potential non-deterministic convergence phenomena. Building upon this, we derive the first high-probability non-asymptotic bound on arm-selection rates. Leveraging this bound, we obtain a refined regret upper bound of order $O(sqrt{T log T})$, which improves upon the empirical performance of classical UCB under heterogeneous variance settings. Notably, this is the first non-asymptotic regret bound of this order achieved by any variance-aware algorithm, substantially advancing the theoretical understanding and performance guarantees for UCB-V.

Technology Category

Application Category

📝 Abstract

In this paper, we study the behavior of the Upper Confidence Bound-Variance (UCB-V) algorithm for the Multi-Armed Bandit (MAB) problems, a variant of the canonical Upper Confidence Bound (UCB) algorithm that incorporates variance estimates into its decision-making process. More precisely, we provide an asymptotic characterization of the arm-pulling rates for UCB-V, extending recent results for the canonical UCB in Kalvit and Zeevi (2021) and Khamaru and Zhang (2024). In an interesting contrast to the canonical UCB, our analysis reveals that the behavior of UCB-V can exhibit instability, meaning that the arm-pulling rates may not always be asymptotically deterministic. Besides the asymptotic characterization, we also provide non-asymptotic bounds for the arm-pulling rates in the high probability regime, offering insights into the regret analysis. As an application of this high probability result, we establish that UCB-V can achieve a more refined regret bound, previously unknown even for more complicate and advanced variance-aware online decision-making algorithms.

Problem

Research questions and friction points this paper is trying to address.

Analyzes UCB-V algorithm in Multi-Armed Bandit

Characterizes asymptotic arm-pulling rates

Establishes refined regret bounds for UCB-V

Innovation

Methods, ideas, or system contributions that make the work stand out.

UCB-V algorithm

variance estimates

refined regret bounds

🔎 Similar Papers

No similar papers found.

Authors to Follow