Semi-Bandit Learning for Monotone Stochastic Optimization*

📅 2023-12-24

🏛️ IEEE Annual Symposium on Foundations of Computer Science

📈 Citations: 2

✨ Influential: 1

career value

226K/year

🤖 AI Summary

This paper addresses monotone stochastic optimization problems with unknown distributions—such as prophet inequalities and Pandora’s box—where classical approaches rely on full distributional knowledge. Method: We propose the first unified semi-bandit online learning framework that learns and approximates the optimal policy solely from observed samples of probed random variables, without any prior distributional information. Our approach integrates semi-bandit feedback modeling, a monotonicity-aware stochastic probing strategy, and refined regret analysis. Contribution/Results: We establish a near-optimal regret bound of $O(sqrt{T log T})$, which strictly improves upon fundamental lower bounds under both full-information and pure bandit settings for multiple canonical problems. This is the first result to break the long-standing dependence on distributional priors in stochastic optimization, providing both a general theoretical foundation and an efficient algorithmic pathway for online approximation in distribution-agnostic environments.

📝 Abstract

Stochastic optimization is a widely used approach for optimization under uncertainty, where uncertain input parameters are modeled by random variables. Exact or approximation algorithms have been obtained for several fundamental problems in this area. However, a significant limitation of this approach is that it requires full knowledge of the underlying probability distributions. Can we still get good (approximation) algorithms if these distributions are unknown, and the algorithm needs to learn them through repeated interactions? In this paper, we resolve this question for a large class of “monotone” stochastic problems, by providing a generic online learning algorithm with $sqrt{Tlog T}$ regret relative to the best approximation algorithm (under known distributions). Importantly, our online algorithm works in a semi-bandit setting, where in each period, the algorithm only observes samples from the random variables that were actually probed. Our frame-work applies to several fundamental problems in stochastic optimization such as prophet inequality, Pandora's box, stochastic knapsack, stochastic matchings and stochastic submodular optimization.

Problem

Research questions and friction points this paper is trying to address.

Optimizing under uncertainty with unknown probability distributions

Learning distributions through semi-bandit feedback interactions

Extending to censored and binary feedback observation settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-bandit online learning algorithm

Works with censored and binary feedback

Applies to monotone stochastic optimization problems

🔎 Similar Papers

Fast and Sample Efficient Multi-Task Representation Learning in Stochastic Contextual Bandits