Identifying the Best Transition Law

📅 2025-02-17

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This paper studies optimal arm identification in structured multi-armed bandits under known support constraints. Specifically, it addresses the setting where rewards from each arm follow a multivariate distribution with a known finite support. To exploit this structural prior, the authors propose EL-LUCB—the first algorithm integrating empirical likelihood (EL) into the LUCB framework—by jointly modeling the multidimensional probability vector over the support. Unlike standard LUCB, which relies on Hoeffding, Bernstein, or nonparametric confidence bounds, EL-LUCB leverages support knowledge to construct tighter, data-adaptive confidence regions. Experiments across multiple synthetic benchmarks with varying structural complexity demonstrate that EL-LUCB substantially reduces sample complexity while improving identification accuracy and statistical efficiency. The results validate the significant gains from both support-aware modeling and joint estimation, establishing a new paradigm for structured pure-exploration bandit problems.

Technology Category

Application Category

📝 Abstract

Motivated by recursive learning in Markov Decision Processes, this paper studies best-arm identification in bandit problems where each arm's reward is drawn from a multinomial distribution with a known support. We compare the performance { reached by strategies including notably LUCB without and with use of this knowledge. } In the first case, we use classical non-parametric approaches for the confidence intervals. In the second case, where a probability distribution is to be estimated, we first use classical deviation bounds (Hoeffding and Bernstein) on each dimension independently, and then the Empirical Likelihood method (EL-LUCB) on the joint probability vector. The effectiveness of these methods is demonstrated through simulations on scenarios with varying levels of structural complexity.

Problem

Research questions and friction points this paper is trying to address.

Identify best-arm in bandit problems

Compare strategies with multinomial rewards

Evaluate effectiveness using simulation scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

LUCB strategy optimization

Empirical Likelihood method

Multinomial reward estimation

🔎 Similar Papers

No similar papers found.