🤖 AI Summary
This paper addresses both stochastic and adversarial multi-armed bandit problems by proposing the first unified analytical framework for the Tsallis-INF algorithm that entirely avoids Fenchel conjugates. Methodologically, it leverages modern tools from online convex optimization—specifically, Bregman divergences and direct characterization of dual updates—to replace conventional, conjugate-dependent derivations. This yields concise, unified proofs of optimal regret bounds: $O(sqrt{KT})$ in the adversarial setting and $Oig(sum_{i:Delta_i>0} frac{log T}{Delta_i}ig)$ in the stochastic setting. The key contribution is the complete elimination of Fenchel conjugates, markedly enhancing theoretical interpretability and analytical scalability. To the best of our knowledge, this work is the first to rigorously establish that Tsallis-INF simultaneously achieves optimal performance across both environments while admitting a significantly simplified, conjugate-free analysis.
📝 Abstract
In this short note, we present a simple derivation of the best-of-both-world guarantee for the Tsallis-INF multi-armed bandit algorithm from J. Zimmert and Y. Seldin. Tsallis-INF: An optimal algorithm for stochastic and adversarial bandits. Journal of Machine Learning Research, 22(28):1-49, 2021. URL https://jmlr.csail.mit.edu/papers/volume22/19-753/19-753.pdf. In particular, the proof uses modern tools from online convex optimization and avoid the use of conjugate functions. Also, we do not optimize the constants in the bounds in favor of a slimmer proof.