๐ค AI Summary
This paper studies online decision-making under partial monitoring with robust exploration strategies that perform well in both stochastic and adversarial environments, addressing the degradation of regret bounds exhibited by existing methods in hybrid settings. We propose a novel Follow-the-Regularized-Leader (FTRL) framework incorporating a mixed regularizer, unifying the modeling of locally and globally observable partial-monitoring games. Our theoretical contributions are: (1) achieving near-optimal two-world regret $Oleft(sum_{a} frac{k^2 m^2 log T}{Delta_a}
ight)$ for locally observable gamesโimproving upon the best prior two-world algorithm by a factor of $Theta(k^2 log T)$; and (2) attaining the first $O(log T)$ stochastic regret bound for globally observable games, breaking the previous $Omega(sqrt{T})$ lower bound. These results significantly advance the state of the art in partial-monitoring learning, enabling tighter and more adaptive performance guarantees across diverse data-generating mechanisms.
๐ Abstract
Partial monitoring is a generic framework of online decision-making problems with limited feedback. To make decisions from such limited feedback, it is necessary to find an appropriate distribution for exploration. Recently, a powerful approach for this purpose, emph{exploration by optimization} (ExO), was proposed, which achieves optimal bounds in adversarial environments with follow-the-regularized-leader for a wide range of online decision-making problems. However, a naive application of ExO in stochastic environments significantly degrades regret bounds. To resolve this issue in locally observable games, we first establish a new framework and analysis for ExO with a hybrid regularizer. This development allows us to significantly improve existing regret bounds of best-of-both-worlds (BOBW) algorithms, which achieves nearly optimal bounds both in stochastic and adversarial environments. In particular, we derive a stochastic regret bound of $O(sum_{a
eq a^*} k^2 m^2 log T / Delta_a)$, where $k$, $m$, and $T$ are the numbers of actions, observations and rounds, $a^*$ is an optimal action, and $Delta_a$ is the suboptimality gap for action $a$. This bound is roughly $Theta(k^2 log T)$ times smaller than existing BOBW bounds. In addition, for globally observable games, we provide a new BOBW algorithm with the first $O(log T)$ stochastic bound.