An Algorithm for Fixed Budget Best Arm Identification with Combinatorial Exploration

📅 2025-02-03

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This paper studies the fixed-budget combinatorial best-arm identification problem: given a budget, multiple arms can be observed simultaneously per round to estimate their mean rewards, aiming to efficiently identify the optimal arm among $K$ candidates. We propose a novel algorithm based on $log_2 K$-level grouping, likelihood ratio testing, and Hamming decoding. We introduce a new hardness parameter $H_4$ that captures the intrinsic difficulty under combinatorial observations and establish, for the first time, an explicit error upper bound scaling with $H_4$. Our theoretical analysis—integrating information-theoretic arguments and combinatorial sampling—demonstrates that this bound strictly improves upon the state-of-the-art for single-arm observations. Empirical evaluations across diverse settings confirm substantial gains in statistical efficiency, revealing the fundamental advantage of combinatorial exploration under budget constraints.

Technology Category

Application Category

📝 Abstract

We consider the best arm identification (BAI) problem in the $K-$armed bandit framework with a modification - the agent is allowed to play a subset of arms at each time slot instead of one arm. Consequently, the agent observes the sample average of the rewards of the arms that constitute the probed subset. Several trade-offs arise here - e.g., sampling a larger number of arms together results in a wider view of the environment, while sampling fewer arms enhances the information about individual reward distributions. Furthermore, grouping a large number of suboptimal arms together albeit reduces the variance of the reward of the group, it may enhance the group mean to make it close to that containing the optimal arm. To solve this problem, we propose an algorithm that constructs $log_2 K$ groups and performs a likelihood ratio test to detect the presence of the best arm in each of these groups. Then a Hamming decoding procedure determines the unique best arm. We derive an upper bound for the error probability of the proposed algorithm based on a new hardness parameter $H_4$. Finally, we demonstrate cases under which it outperforms the state-of-the-art algorithms for the single play case.

Problem

Research questions and friction points this paper is trying to address.

Multi-armed Bandit Problem

Budget Constraint

Optimal Selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal Selection Algorithm

Budget-Constrained Testing

Group Testing Strategy

🔎 Similar Papers

No similar papers found.

Authors to Follow