Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This paper studies the multi-armed bandit problem where rewards are i.i.d. and the expected reward function has at most (m) modes (i.e., local extrema). To exploit this multimodal structure, we propose the first algorithmic framework capable of efficiently solving the Graves–Lai asymptotic optimality optimization problem—breaking the classical unimodality assumption. Our method integrates information-theoretic lower-bound analysis with mode-aware confidence interval construction, enabling scalable modeling of multimodal structures and adaptive exploration. We prove that the proposed policy achieves the information-theoretic minimal regret lower bound. Empirical evaluations demonstrate its significant superiority and fast convergence in multimodal environments compared to existing baselines. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available at https://github.com/wilrev/MultimodalBandits

Problem

Research questions and friction points this paper is trying to address.

Solving multimodal stochastic bandits with unknown reward modes

Developing tractable algorithms for Graves-Lai optimization problems

Providing asymptotically optimal algorithms for multimodal bandit settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes first tractable algorithm for Graves-Lai optimization

Enables asymptotically optimal multimodal bandit algorithms

Addresses stochastic bandits with multimodal reward functions

🔎 Similar Papers

Nearly Minimax Optimal Regret for Multinomial Logistic Bandit