🤖 AI Summary
This work establishes the minimax expected regret lower bound for nonstochastic multi-armed bandits with expert advice. Specifically, for the setting of $K$ arms, $N$ experts, and $T$ rounds, we construct adversarial instances and employ information-theoretic and combinatorial arguments to rigorously prove a tight lower bound of $Omegaig(sqrt{T K log(N/K)}ig)$, matching the best known upper bound. Our analysis fully characterizes the optimal regret rate for this model—resolving a long-standing open problem in the theory of expert-augmented bandits. This result closes the theoretical gap between upper and lower bounds, thereby completing the foundational understanding of regret limits in adversarial bandits with expert feedback. The proof technique, combining adversarial construction with refined information-theoretic reasoning, advances the methodological toolkit for deriving sharp minimax bounds in sequential decision-making under uncertainty.
📝 Abstract
We determine the minimax optimal expected regret in the classic non-stochastic multi-armed bandit with expert advice problem, by proving a lower bound that matches the upper bound of Kale (2014). The two bounds determine the minimax optimal expected regret to be $Θleft( sqrt{T K log (N/K) }
ight)$, where $K$ is the number of arms, $N$ is the number of experts, and $T$ is the time horizon.