🤖 AI Summary
This paper addresses online sequential decision-making under test-cost sensitivity, a critical challenge in applications such as medical diagnosis and recommendation systems. Method: We propose the first combinatorial multi-armed bandit (CMAB) framework that explicitly incorporates stochastic test costs into its modeling. Our approach unifies Bayesian strategies—including Thompson Sampling and BayesUCB—within a cost-aware CMAB setting, leveraging Bayesian posterior inference to dynamically balance information acquisition cost against decision reward, thereby enabling adaptive, low-cost, high-value test selection. Contribution/Results: We establish the first sublinear regret bound for Thompson Sampling in cost-aware combinatorial bandits. Empirical evaluation on real-world diagnostic and recommendation tasks demonstrates an average 37% reduction in testing cost while preserving decision accuracy, validating both theoretical guarantees and practical efficacy.
📝 Abstract
Online decision making plays a crucial role in numerous real-world applications. In many scenarios, the decision is made based on performing a sequence of tests on the incoming data points. However, performing all tests can be expensive and is not always possible. In this paper, we provide a novel formulation of the online decision making problem based on combinatorial multi-armed bandits and take the (possibly stochastic) cost of performing tests into account. Based on this formulation, we provide a new framework for cost-efficient online decision making which can utilize posterior sampling or BayesUCB for exploration. We provide a theoretical analysis of Thompson Sampling for cost-efficient online decision making, and present various experimental results that demonstrate the applicability of our framework to real-world problems.