Thompson Sampling for Repeated Newsvendor

📅 2025-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the repeated newsvendor problem—a canonical online inventory decision problem—with censored feedback. Addressing the lack of theoretical guarantees for classical Thompson Sampling (TS) under censoring, we establish, for the first time, an optimal frequentist regret bound for TS in this setting—up to logarithmic factors—and reveal its implicit correction of censoring bias via adaptive order-size adjustment. Methodologically, we employ Weibull-distributed demand modeling with Gamma priors, enabling Bayesian updating without strong prior assumptions. We prove that TS achieves a near-optimal dynamic ordering policy. Numerical experiments demonstrate that TS significantly outperforms benchmark methods, including UCB, online convex optimization, and myopic Bayesian dynamic programming.

Technology Category

Application Category

📝 Abstract
In this paper, we investigate the performance of Thompson Sampling (TS) for online learning with censored feedback, focusing primarily on the classic repeated newsvendor model--a foundational framework in inventory management--and demonstrating how our techniques can be naturally extended to a broader class of problems. We model demand using a Weibull distribution and initialize TS with a Gamma prior to dynamically adjust order quantities. Our analysis establishes optimal (up to logarithmic factors) frequentist regret bounds for TS without imposing restrictive prior assumptions. More importantly, it yields novel and highly interpretable insights on how TS addresses the exploration-exploitation trade-off in the repeated newsvendor setting. Specifically, our results show that when past order quantities are sufficiently large to overcome censoring, TS accurately estimates the unknown demand parameters, leading to near-optimal ordering decisions. Conversely, when past orders are relatively small, TS automatically increases future order quantities to gather additional demand information. Extensive numerical simulations further demonstrate that TS outperforms more conservative and widely-used approaches such as online convex optimization, upper confidence bounds, and myopic Bayesian dynamic programming. This study also lays the foundation for exploring general online learning problems with censored feedback.
Problem

Research questions and friction points this paper is trying to address.

Optimize inventory using Thompson Sampling
Address exploration-exploitation trade-off
Handle censored feedback in online learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Thompson Sampling for censored feedback
Gamma prior with Weibull demand
Optimal frequentist regret bounds
🔎 Similar Papers
No similar papers found.