Multimodal Bandits: Regret Lower Bounds and Optimal Algorithms

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the multi-armed bandit problem where rewards are i.i.d. and the expected reward function has at most (m) modes (i.e., local extrema). To exploit this multimodal structure, we propose the first algorithmic framework capable of efficiently solving the Graves–Lai asymptotic optimality optimization problem—breaking the classical unimodality assumption. Our method integrates information-theoretic lower-bound analysis with mode-aware confidence interval construction, enabling scalable modeling of multimodal structures and adaptive exploration. We prove that the proposed policy achieves the information-theoretic minimal regret lower bound. Empirical evaluations demonstrate its significant superiority and fast convergence in multimodal environments compared to existing baselines. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
We consider a stochastic multi-armed bandit problem with i.i.d. rewards where the expected reward function is multimodal with at most m modes. We propose the first known computationally tractable algorithm for computing the solution to the Graves-Lai optimization problem, which in turn enables the implementation of asymptotically optimal algorithms for this bandit problem. The code for the proposed algorithms is publicly available at https://github.com/wilrev/MultimodalBandits
Problem

Research questions and friction points this paper is trying to address.

Solving multimodal stochastic bandits with unknown reward modes
Developing tractable algorithms for Graves-Lai optimization problems
Providing asymptotically optimal algorithms for multimodal bandit settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes first tractable algorithm for Graves-Lai optimization
Enables asymptotically optimal multimodal bandit algorithms
Addresses stochastic bandits with multimodal reward functions
🔎 Similar Papers
No similar papers found.
W
William Réveillard
Division of Decision and Control Systems, KTH Royal Institute of Technology, 11428 Stockholm, Sweden
Richard Combes
Richard Combes
Assistant Professor, Supélec
machine learningapplied probabilitynetworksinformation theory