ORPR: An OR-Guided Pretrain-then-Reinforce Learning Model for Inventory Management

šŸ“… 2025-12-21
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
Complex inventory management remains challenging due to insufficient synergy between AI adaptability and the structural rigor of operations research (OR). Method: This paper proposes a two-stage ā€œpretraining–reinforcement alignmentā€ framework: first, a simulation-augmented OR model generates constraint-aware decision labels to train a domain-aware foundation model; second, a constraint-embedding reinforcement learning (RL) mechanism unifies optimization and adaptation—internalizing OR optimality principles while enabling expert-guided scenario customization. Contribution/Results: We introduce the first OR-guided alignment paradigm, overcoming pure data-driven limitations to achieve strong interpretability, high robustness, and cross-scenario generalization—even with lightweight models. Empirical validation at JD.com demonstrates significant improvements: inventory turnover days reduced by 5.27, in-stock rate increased by 2.29%, and holding costs decreased by 29.95%, substantially outperforming state-of-the-art industrial solutions and confirming both SOTA performance and practical efficacy.

Technology Category

Application Category

šŸ“ Abstract
As the pursuit of synergy between Artificial Intelligence (AI) and Operations Research (OR) gains momentum in handling complex inventory systems, a critical challenge persists: how to effectively reconcile AI's adaptive perception with OR's structural rigor. To bridge this gap, we propose a novel OR-Guided "Pretrain-then-Reinforce" framework. To provide structured guidance, we propose a simulation-augmented OR model that generates high-quality reference decisions, implicitly capturing complex business constraints and managerial preferences. Leveraging these OR-derived decisions as foundational training labels, we design a domain-informed deep learning foundation model to establish foundational decision-making capabilities, followed by a reinforcement learning (RL) fine-tuning stage. Uniquely, we position RL as a deep alignment mechanism that enables the AI agent to internalize the optimality principles of OR, while simultaneously leveraging exploration for general policy refinement and allowing expert guidance for scenario-specific adaptation (e.g., promotional events). Validated through extensive numerical experiments and a field deployment at JD.com augmented by a Difference-in-Differences (DiD) analysis, our model significantly outperforms incumbent industrial practices, delivering real-world gains of a 5.27-day reduction in turnover and a 2.29% increase in in-stock rates, alongside a 29.95% decrease in holding costs. Contrary to the prevailing trend of brute-force model scaling, our study demonstrates that a lightweight, domain-informed model can deliver state-of-the-art performance and robust transferability when guided by structured OR logic. This approach offers a scalable and cost-effective paradigm for intelligent supply chain management, highlighting the value of deeply aligning AI with OR.
Problem

Research questions and friction points this paper is trying to address.

Reconcile AI's adaptive perception with OR's structural rigor in inventory management.
Generate high-quality reference decisions capturing business constraints and preferences.
Enable AI to internalize OR's optimality principles while refining policies via reinforcement learning.
Innovation

Methods, ideas, or system contributions that make the work stand out.

OR-guided pretrain-then-reinforce learning framework
Simulation-augmented OR model generates reference decisions
Reinforcement learning fine-tunes for alignment and adaptation
šŸ”Ž Similar Papers
No similar papers found.
L
Lingjie Zhao
Department of Industrial Engineering, Tsinghua University, Beijing 100084, China
X
Xue Yu
Supply Chain Tech Team Y, JD.com, Beijing 101111, China
Y
Yongzhi Qi
Supply Chain Tech Team Y, JD.com, Beijing 101111, China
H
Hao Hu
Supply Chain Tech Team Y, JD.com, Beijing 101111, China
J
Jianshen Zhang
Supply Chain Tech Team Y, JD.com, Beijing 101111, China
Y
Yingzheng Ma
Supply Chain Tech Team Y, JD.com, Beijing 101111, China
S
Shuyu Han
Supply Chain Tech Team Y, JD.com, Beijing 101111, China
Wei Qi
Wei Qi
Tsinghua University
Operations Management
Z
Zuo-Jun Max Shen
Faculty of Engineering and Faculty of Business and Economics, The University of Hong Kong, Hong Kong 999077, China