🤖 AI Summary
This paper introduces “Optimal Arm Retention” (OAR), a novel pure-exploration problem in stochastic multi-armed bandits: retaining a subset of $m$ arms—out of $n$—that provably contains the globally optimal arm, under budget or dynamic constraints. We formalize OAR as an independent learning objective and propose a unified framework jointly optimizing retention confidence and decision-switching cost. Our method integrates UCB-style upper confidence bounds, Bayesian posterior updates, and sequential significance testing, with theoretical guarantees on regret. Empirical evaluation on synthetic benchmarks and real-world delayed-feedback settings demonstrates that our approach achieves a 37% improvement in retention accuracy and reduces decision switches by 52% compared to classical Best Arm Identification algorithms, significantly enhancing online adaptability and deployment efficiency.