Optimal Data Driven Resource Allocation under Multi-Armed Bandit Observations

📅 2018-11-30

📈 Citations: 1

✨ Influential: 0

career value

257K/year

🤖 AI Summary

This paper studies the resource-constrained multi-armed bandit (MAB) problem, where arm activations are subject to dynamically replenished (constant-rate) resource constraints, and the objective is to minimize long-term cumulative regret. Methodologically, we develop a unified asymptotic optimality framework for constrained MABs by integrating asymptotic statistical analysis, information-theoretic lower-bound derivation, and adaptive confidence interval design—applicable to both normal distributions (with unknown means and variances) and bounded-support discrete distributions. Our key contribution is the first provably asymptotically optimal policy for this setting: we rigorously establish a fundamental regret lower bound under general distributional assumptions and construct a policy that achieves it. This yields the first universal, computationally tractable solution that attains asymptotic optimality across multiple distribution classes, thereby providing both theoretical foundations and practical algorithms for resource-aware sequential decision-making.

📝 Abstract

This paper introduces the first asymptotically optimal strategy for a multi armed bandit (MAB) model under side constraints. The side constraints model situations in which bandit activations are limited by the availability of certain resources that are replenished at a constant rate. The main result involves the derivation of an asymptotic lower bound for the regret of feasible uniformly fast policies and the construction of policies that achieve this lower bound, under pertinent conditions. Further, we provide the explicit form of such policies for the case in which the unknown distributions are Normal with unknown means and known variances, for the case of Normal distributions with unknown means and unknown variances and for the case of arbitrary discrete distributions with finite support.

Problem

Research questions and friction points this paper is trying to address.

Optimal resource allocation strategy

Multi-armed bandit with constraints

Asymptotic lower bound for regret

Innovation

Methods, ideas, or system contributions that make the work stand out.

optimal resource allocation strategy

multi-armed bandit model

asymptotic lower bound derivation

🔎 Similar Papers

Multi-Player Approaches for Dueling Bandits

2024-05-25arXiv.orgCitations: 1

Amazon

Arlington, VA, USA / Bellevue, WA, USA / Boston, MA, USA

Authors to Follow