🤖 AI Summary
This paper studies worst-case regret in multi-armed bandits under Knightian uncertainty. We consider a strictly uncertain environment where the decision maker observes only finitely many samples and faces an adversarial nature. To formalize this, we construct a probability game model and employ minimax analysis combined with asymptotic theory. We rigorously prove that the greedy policy—selecting the arm with the highest empirical mean—is globally optimal in the worst-case regret sense, and its regret rate converges to zero at rate $O(1/sqrt{n})$ as the sample size $n$ grows. This constitutes the first theoretical demonstration of the greedy policy’s optimality within a purely non-probabilistic uncertainty framework. Empirical evaluation on Google restaurant review data shows that the greedy policy significantly outperforms uniform sampling and Thompson sampling, confirming both its theoretical soundness and practical efficacy.
📝 Abstract
In this paper, we propose a probabilistic game-theoretic model to study the properties of the worst-case regret of the greedy strategy under complete (Knightian) uncertainty. In a game between a decision-maker (DM) and an adversarial agent (Nature), the DM observes a realization of product ratings for each product. Upon observation, the DM chooses a strategy, which is a function from the set of observations to the set of products. We study the theoretical properties, including the worst-case regret of the greedy strategy that chooses the product with the highest observed average rating. We prove that, with respect to the worst-case regret, the greedy strategy is optimal and that, in the limit, the regret of the greedy strategy converges to zero. We validate the model on data collected from Google reviews for restaurants, showing that the greedy strategy not only performs according to the theoretical findings but also outperforms the uniform strategy and the Thompson Sampling algorithm.