AlphaInventory: Evolving White-Box Inventory Policies via Large Language Models with Deployment Guarantees

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the evolution of safety stock policies in online non-stationary environments by proposing the first end-to-end framework that integrates large language models with reinforcement learning. The approach leverages multimodal demand features to generate interpretable, white-box inventory policies and incorporates a confidence interval–based certification mechanism to provide statistical safety guarantees. A unified theoretical interface is established across training, inference, and deployment phases to ensure practical deployability. Experimental results on both synthetic and real-world retail datasets demonstrate that the evolved policies significantly outperform classical baselines and existing deep learning methods, and even discover novel strategies that surpass known optima in standard scenarios.

📝 Abstract

We study how large language models can be used to evolve inventory policies in online, non-stationary environments. Our work is motivated by recent advances in LLM-based evolutionary search, such as AlphaEvolve, which demonstrates strong performance for static and highly structured problems such as mathematical discovery, but is not directly suited to online dynamic inventory settings. To this end, we propose AlphaInventory, an end-to-end inventory-policy evolution and inference framework grounded in confidence-interval-based certification. The framework trains a large language model using reinforcement learning, incorporates demand data as well as numerical and textual features beyond demand, and generates white-box inventory policy with statistical safety guarantees for deployment in future periods. We further introduce a unified theoretical interface that connects training, inference, and deployment. This allows us to characterize the probability that the AlphaInventory evolves a statistically safe and improved policy, and to quantify the deployment gap relative to the oracle-safe benchmark. Tested on both synthetic data and real-world retail data, AlphaInventory outperforms classical inventory policies and deep learning based methods. In canonical inventory settings, it evolves new policies that improve upon existing benchmarks.

Problem

Research questions and friction points this paper is trying to address.

inventory policy

large language models

non-stationary environments

deployment guarantees

white-box policies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Inventory Policy

White-Box Policy