AlphaInventory: Evolving White-Box Inventory Policies via Large Language Models with Deployment Guarantees

๐Ÿ“… 2026-04-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

207K/year
๐Ÿค– AI Summary
This work addresses the evolution of safety stock policies in online non-stationary environments by proposing the first end-to-end framework that integrates large language models with reinforcement learning. The approach leverages multimodal demand features to generate interpretable, white-box inventory policies and incorporates a confidence intervalโ€“based certification mechanism to provide statistical safety guarantees. A unified theoretical interface is established across training, inference, and deployment phases to ensure practical deployability. Experimental results on both synthetic and real-world retail datasets demonstrate that the evolved policies significantly outperform classical baselines and existing deep learning methods, and even discover novel strategies that surpass known optima in standard scenarios.
๐Ÿ“ Abstract
We study how large language models can be used to evolve inventory policies in online, non-stationary environments. Our work is motivated by recent advances in LLM-based evolutionary search, such as AlphaEvolve, which demonstrates strong performance for static and highly structured problems such as mathematical discovery, but is not directly suited to online dynamic inventory settings. To this end, we propose AlphaInventory, an end-to-end inventory-policy evolution and inference framework grounded in confidence-interval-based certification. The framework trains a large language model using reinforcement learning, incorporates demand data as well as numerical and textual features beyond demand, and generates white-box inventory policy with statistical safety guarantees for deployment in future periods. We further introduce a unified theoretical interface that connects training, inference, and deployment. This allows us to characterize the probability that the AlphaInventory evolves a statistically safe and improved policy, and to quantify the deployment gap relative to the oracle-safe benchmark. Tested on both synthetic data and real-world retail data, AlphaInventory outperforms classical inventory policies and deep learning based methods. In canonical inventory settings, it evolves new policies that improve upon existing benchmarks.
Problem

Research questions and friction points this paper is trying to address.

inventory policy
large language models
non-stationary environments
deployment guarantees
white-box policies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Inventory Policy
White-Box Policy
Statistical Safety Guarantees
Reinforcement Learning