Identifying the Best Transition Law

📅 2025-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies optimal arm identification in structured multi-armed bandits under known support constraints. Specifically, it addresses the setting where rewards from each arm follow a multivariate distribution with a known finite support. To exploit this structural prior, the authors propose EL-LUCB—the first algorithm integrating empirical likelihood (EL) into the LUCB framework—by jointly modeling the multidimensional probability vector over the support. Unlike standard LUCB, which relies on Hoeffding, Bernstein, or nonparametric confidence bounds, EL-LUCB leverages support knowledge to construct tighter, data-adaptive confidence regions. Experiments across multiple synthetic benchmarks with varying structural complexity demonstrate that EL-LUCB substantially reduces sample complexity while improving identification accuracy and statistical efficiency. The results validate the significant gains from both support-aware modeling and joint estimation, establishing a new paradigm for structured pure-exploration bandit problems.

Technology Category

Application Category

📝 Abstract
Motivated by recursive learning in Markov Decision Processes, this paper studies best-arm identification in bandit problems where each arm's reward is drawn from a multinomial distribution with a known support. We compare the performance { reached by strategies including notably LUCB without and with use of this knowledge. } In the first case, we use classical non-parametric approaches for the confidence intervals. In the second case, where a probability distribution is to be estimated, we first use classical deviation bounds (Hoeffding and Bernstein) on each dimension independently, and then the Empirical Likelihood method (EL-LUCB) on the joint probability vector. The effectiveness of these methods is demonstrated through simulations on scenarios with varying levels of structural complexity.
Problem

Research questions and friction points this paper is trying to address.

Identify best-arm in bandit problems
Compare strategies with multinomial rewards
Evaluate effectiveness using simulation scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

LUCB strategy optimization
Empirical Likelihood method
Multinomial reward estimation
🔎 Similar Papers
No similar papers found.
M
Mehrasa Ahmadipour
UMPA, ENS de Lyon, Lyon, France
E
elise Crepon
UMPA, ENS de Lyon, Lyon, France
Aurélien Garivier
Aurélien Garivier
Ecole Normale Supérieure de Lyon
Machine learningSequential StatisticsInformation TheoryStatisticsMarkov Models