🤖 AI Summary
Classical deep reinforcement learning (DRL) methods struggle to consistently outperform heuristic policies—such as the capped base-stock policy—in inventory optimization under dynamic and uncertain demand, primarily due to their failure to explicitly model system stochasticity and physical constraints.
Method: This paper proposes the first end-to-end inventory control framework integrating controlled stochastic differential equations (SDEs) with deep neural networks. It models demand and inventory dynamics in continuous time, leverages stochastic optimal control theory to ensure policy interpretability, and employs gradient-augmented policy optimization for data-driven adaptability.
Contribution/Results: Evaluated across diverse supply chain simulation environments, the framework significantly reduces stockout rates and holding costs. It achieves an average total cost reduction of 18.7% compared to state-of-the-art DRL baselines SAC and DQN, demonstrating superior robustness, interpretability, and empirical performance under uncertainty.