🤖 AI Summary
This paper studies optimal arm identification in structured multi-armed bandits under known support constraints. Specifically, it addresses the setting where rewards from each arm follow a multivariate distribution with a known finite support. To exploit this structural prior, the authors propose EL-LUCB—the first algorithm integrating empirical likelihood (EL) into the LUCB framework—by jointly modeling the multidimensional probability vector over the support. Unlike standard LUCB, which relies on Hoeffding, Bernstein, or nonparametric confidence bounds, EL-LUCB leverages support knowledge to construct tighter, data-adaptive confidence regions. Experiments across multiple synthetic benchmarks with varying structural complexity demonstrate that EL-LUCB substantially reduces sample complexity while improving identification accuracy and statistical efficiency. The results validate the significant gains from both support-aware modeling and joint estimation, establishing a new paradigm for structured pure-exploration bandit problems.
📝 Abstract
Motivated by recursive learning in Markov Decision Processes, this paper studies best-arm identification in bandit problems where each arm's reward is drawn from a multinomial distribution with a known support. We compare the performance { reached by strategies including notably LUCB without and with use of this knowledge. } In the first case, we use classical non-parametric approaches for the confidence intervals. In the second case, where a probability distribution is to be estimated, we first use classical deviation bounds (Hoeffding and Bernstein) on each dimension independently, and then the Empirical Likelihood method (EL-LUCB) on the joint probability vector. The effectiveness of these methods is demonstrated through simulations on scenarios with varying levels of structural complexity.