Cost-Ordered Feasibility for Multi-Armed Bandits with Cost Subsidy

πŸ“… 2026-05-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

185K/year
πŸ€– AI Summary
This work addresses the constrained multi-armed bandit problem with a minimum reward requirement, where the constraint is defined relative to an unknown optimal reward and the cost of each arm is known. To tackle this setting, the authors propose the Cost-Ordered Feasibility (COF) algorithm, which employs adaptive sampling and a feasibility-checking mechanism to intelligently aggregate information across all arms and prioritize exploration of low-cost feasible arms. Theoretically, the study establishes the first instance-dependent lower bound for this problem and proves that COF achieves a tighter regret upper bound compared to existing methods. Empirically, extensive experiments on MovieLens, Goodreads, and synthetic datasets demonstrate that COF significantly outperforms current approaches in terms of both cumulative cost and quality regret.
πŸ“ Abstract
The classic multi-armed bandit (MAB) problem tackles the challenge of accruing maximum reward while making decisions under uncertainty. However, in applications, often the goal is to minimize cost subject to a constraint on the minimum permissible reward, an objective captured by multi-armed bandits with cost-subsidy (MAB-CS). Of interest to this paper is the setting where the quality (reward) constraint is specified relative to the unknown best reward and the cost of each arm is known. We characterize the expected sub-optimal samples required by any policy by proving instance-dependent lower bounds that offer new insight into the problem and are a strict generalization of prior bounds. Then, we propose an algorithm called Cost-Ordered Feasibility (COF) that leverages our insight and intelligently combine samples from all arms to gauge the feasibility of a cheap arm. Thereafter, we analyze COF to establish instance-dependent upper bounds on its expected cumulative cost and quality regret, i.e., relative to the cheapest feasible arm. Finally, we empirically validate the merits of COF, comparing it to baselines from the literature through extensive simulation experiments on the MovieLens and Goodreads datasets as well as representative synthetic instances. Not only does our paper develop qualitatively better theoretical regret upper bounds, but COF also convincingly demonstrates improved empirical performance.
Problem

Research questions and friction points this paper is trying to address.

multi-armed bandits
cost subsidy
reward constraint
cost minimization
feasibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-armed bandits with cost subsidy
instance-dependent bounds
Cost-Ordered Feasibility
quality regret
feasibility verification
πŸ”Ž Similar Papers
2024-06-05Neural Information Processing SystemsCitations: 1