🤖 AI Summary
This work addresses inventory control under demand censoring caused by order quantities and proposes the ICGPS framework, which uniquely integrates contextual learning with generative posterior sampling. Leveraging an offline pretrained ChronosFlow model—comprising a frozen time-series Transformer backbone and a trainable conditional normalizing flow head—the method performs censoring-consistent autoregressive imputation of latent demand and subsequently executes the optimal policy. Theoretically, its Bayesian regret is bounded by that of Thompson sampling under ideal demand imputation, enabling effective transfer from offline prediction to online decision-making. Empirical results demonstrate that ICGPS matches the performance of correctly specified Thompson sampling on both synthetic and real-world SuperStore data, significantly outperforming myopic and UCB-based baselines while exhibiting strong robustness to prior misspecification and distributional shifts, particularly in highly censored regimes.
📝 Abstract
We study inventory control with decision-dependent censoring, focusing on the censored or repeated newsvendor (R-NV), where each order quantity determines whether demand is fully observed or censored by sales. Existing approaches based on parametric Thompson sampling (TS) can be brittle under prior mismatch, while offline imputation methods need not transfer to online learning. Motivated by the predictive view of decision making, we combine these ideas by taking oracle actions on learned completions of latent demand. We propose in-context generative posterior sampling (ICGPS), which uses modern generative models that are meta-trained offline and deployed online by in-context autoregressive generation. Theoretically, we show that the Bayesian regret of ICGPS with a learned completion kernel is bounded by the Bayesian regret of a TS benchmark with the ideal completion kernel plus a deployment penalty scaling as $\sqrt{T}$ times the square root of the completion mismatch. This yields a plug-in template for operational problems with known TS regret bounds. For R-NV, we derive sublinear Bayesian regret by reducing censored feedback to bandit convex optimization feedback. We also show that, under reasonable coverage and stability assumptions, the online completion mismatch is controlled by the offline censored predictive mismatch, so offline predictive quality transfers to online performance. Practically, we instantiate ICGPS with ChronosFlow, which combines a frozen time-series transformer backbone with a trainable conditional normalizing-flow head for fast censoring-consistent sampling. In benchmark experiments, ChronosFlow-ICGPS matches correctly specified TS, outperforms myopic and UCB-style baselines, and is robust to prior mismatch and distribution shift. ChronosFlow-ICGPS also performs well for the real-world SuperStore dataset, especially under heavy censoring.