Exploration by Optimization with Hybrid Regularizers: Logarithmic Regret with Adversarial Robustness in Partial Monitoring

๐Ÿ“… 2024-02-13
๐Ÿ›๏ธ International Conference on Machine Learning
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This paper studies online decision-making under partial monitoring with robust exploration strategies that perform well in both stochastic and adversarial environments, addressing the degradation of regret bounds exhibited by existing methods in hybrid settings. We propose a novel Follow-the-Regularized-Leader (FTRL) framework incorporating a mixed regularizer, unifying the modeling of locally and globally observable partial-monitoring games. Our theoretical contributions are: (1) achieving near-optimal two-world regret $Oleft(sum_{a} frac{k^2 m^2 log T}{Delta_a} ight)$ for locally observable gamesโ€”improving upon the best prior two-world algorithm by a factor of $Theta(k^2 log T)$; and (2) attaining the first $O(log T)$ stochastic regret bound for globally observable games, breaking the previous $Omega(sqrt{T})$ lower bound. These results significantly advance the state of the art in partial-monitoring learning, enabling tighter and more adaptive performance guarantees across diverse data-generating mechanisms.

Technology Category

Application Category

๐Ÿ“ Abstract
Partial monitoring is a generic framework of online decision-making problems with limited feedback. To make decisions from such limited feedback, it is necessary to find an appropriate distribution for exploration. Recently, a powerful approach for this purpose, emph{exploration by optimization} (ExO), was proposed, which achieves optimal bounds in adversarial environments with follow-the-regularized-leader for a wide range of online decision-making problems. However, a naive application of ExO in stochastic environments significantly degrades regret bounds. To resolve this issue in locally observable games, we first establish a new framework and analysis for ExO with a hybrid regularizer. This development allows us to significantly improve existing regret bounds of best-of-both-worlds (BOBW) algorithms, which achieves nearly optimal bounds both in stochastic and adversarial environments. In particular, we derive a stochastic regret bound of $O(sum_{a eq a^*} k^2 m^2 log T / Delta_a)$, where $k$, $m$, and $T$ are the numbers of actions, observations and rounds, $a^*$ is an optimal action, and $Delta_a$ is the suboptimality gap for action $a$. This bound is roughly $Theta(k^2 log T)$ times smaller than existing BOBW bounds. In addition, for globally observable games, we provide a new BOBW algorithm with the first $O(log T)$ stochastic bound.
Problem

Research questions and friction points this paper is trying to address.

Improving regret bounds in partial monitoring
Hybrid regularizer for stochastic environments
Optimal bounds in both adversarial and stochastic settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid regularizer enhances exploration optimization
Achieves logarithmic regret in stochastic environments
Improves best-of-both-worlds algorithms significantly
๐Ÿ”Ž Similar Papers
No similar papers found.