Cost-Aware Optimal Pairwise Pure Exploration

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work studies pure-exploration multi-armed bandits with arm-specific sampling costs, aiming to accurately identify the relative superiority among target arm pairs while minimizing cumulative sampling cost—moving beyond the conventional focus solely on sample complexity. We first establish a general cost-sensitive pairwise pure-exploration framework and derive an asymptotic lower bound on cumulative cost. Building upon this, we propose the Cost-Aware Adaptive Exploration and Tracking (CAET) algorithm, which integrates a track-and-stop sampling strategy with large-deviation analysis; we prove its asymptotic optimality with respect to the derived lower bound. Experiments demonstrate that CAET significantly reduces cumulative cost across diverse cost configurations, attaining performance close to the theoretical lower bound and outperforming existing pure-exploration and adaptive sampling methods. Moreover, the framework naturally extends to regret minimization settings.

Technology Category

Application Category

📝 Abstract

Pure exploration is one of the fundamental problems in multi-armed bandits (MAB). However, existing works mostly focus on specific pure exploration tasks, without a holistic view of the general pure exploration problem. This work fills this gap by introducing a versatile framework to study pure exploration, with a focus on identifying the pairwise relationships between targeted arm pairs. Moreover, unlike existing works that only optimize the stopping time (i.e., sample complexity), this work considers that arms are associated with potentially different costs and targets at optimizing the cumulative cost that occurred during learning. Under the general framework of pairwise pure exploration with arm-specific costs, a performance lower bound is derived. Then, a novel algorithm, termed CAET (Cost-Aware Pairwise Exploration Task), is proposed. CAET builds on the track-and-stop principle with a novel design to handle the arm-specific costs, which can potentially be zero and thus represent a very challenging case. Theoretical analyses prove that the performance of CAET approaches the lower bound asymptotically. Special cases are further discussed, including an extension to regret minimization, which is another major focus of MAB. The effectiveness and efficiency of CAET are also verified through experimental results under various settings.

Problem

Research questions and friction points this paper is trying to address.

Develops a framework for general pure exploration in multi-armed bandits.

Focuses on optimizing cumulative cost, not just stopping time.

Introduces CAET algorithm to handle arm-specific costs effectively.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces versatile framework for pure exploration

Optimizes cumulative cost, not just stopping time

Proposes CAET algorithm handling arm-specific costs

🔎 Similar Papers

Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning