Best Arm Identification with Possibly Biased Offline Data

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies best-arm identification (BAI) under fixed-confidence guarantees using offline data subject to distributional shift—motivated by real-world settings such as clinical trials where offline and online distributions differ. We first establish a fundamental limitation: in the absence of unbiased prior knowledge, no adaptive algorithm can strictly outperform purely online methods. To address this, we propose LUCB-H, a novel algorithm that integrates bias-aware auxiliary correction with LUCB-style adaptive confidence intervals, enabling robust and adaptive weighting of offline and online data. Theoretically, we derive a tight, instance-dependent lower bound on sample complexity that matches our algorithm’s upper bound. Empirically, LUCB-H degrades gracefully to standard LUCB performance when offline data is biased, yet achieves significant sampling efficiency gains when offline data is informative. Thus, LUCB-H attains both theoretical optimality and practical adaptability.

Technology Category

Application Category

📝 Abstract
We study the best arm identification (BAI) problem with potentially biased offline data in the fixed confidence setting, which commonly arises in real-world scenarios such as clinical trials. We prove an impossibility result for adaptive algorithms without prior knowledge of the bias bound between online and offline distributions. To address this, we propose the LUCB-H algorithm, which introduces adaptive confidence bounds by incorporating an auxiliary bias correction to balance offline and online data within the LUCB framework. Theoretical analysis shows that LUCB-H matches the sample complexity of standard LUCB when offline data is misleading and significantly outperforms it when offline data is helpful. We also derive an instance-dependent lower bound that matches the upper bound of LUCB-H in certain scenarios. Numerical experiments further demonstrate the robustness and adaptability of LUCB-H in effectively incorporating offline data.
Problem

Research questions and friction points this paper is trying to address.

Identifying best arm with biased offline data
Proposing LUCB-H algorithm for bias correction
Analyzing sample complexity and performance bounds
Innovation

Methods, ideas, or system contributions that make the work stand out.

LUCB-H algorithm with bias correction
Adaptive confidence bounds balancing data
Instance-dependent lower bound matching
🔎 Similar Papers
No similar papers found.