Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations

📅 2024-08-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This paper addresses the real-time order picking optimization problem in single-zone warehouses, where orders arrive dynamically and routing decisions must be adjusted online. Method: We propose the first end-to-end deep reinforcement learning (DRL) framework that jointly optimizes online routing and order dispatching. Using a Deep Q-Network (DQN) architecture, we design a state encoding integrating order queue status, rack occupancy, and robot locations, and introduce sparse reward shaping with tunable hyperparameters to jointly minimize travel distance and delivery latency. Contribution/Results: To our knowledge, this is the first systematic application of DRL to real-time picking under dynamic order streams. The framework exhibits strong cross-instance robustness and generalization capability. Experiments show a 98% fulfillment rate under high-order arrival rates (λ = 0.09), outperforming conventional algorithms by 16 percentage points, while significantly reducing average order cycle time and the number of unfulfilled orders.

Technology Category

Application Category

📝 Abstract

Order picking is a pivotal operation in warehouses that directly impacts overall efficiency and profitability. This study addresses the dynamic order picking problem, a significant concern in modern warehouse management, where real-time adaptation to fluctuating order arrivals and efficient picker routing are crucial. Traditional methods, which often depend on static optimization algorithms designed around fixed order sets for the picker routing, fall short in addressing the challenges of this dynamic environment. To overcome these challenges, we propose a Deep Reinforcement Learning (DRL) framework tailored for single-block warehouses equipped with an autonomous picking device. By dynamically optimizing picker routes, our approach significantly reduces order throughput times and unfulfilled orders, particularly under high order arrival rates. We benchmark our DRL model against established algorithms, utilizing instances generated based on standard practices in the order picking literature. Experimental results demonstrate the superiority of our DRL model over benchmark algorithms. For example, at a high order arrival rate of 0.09 (i.e., 9 orders per 100 units of time on average), our approach achieves an order fulfillment rate of approximately 98%, compared to the 82% fulfillment rate observed with benchmarking algorithms. We further investigate the integration of a hyperparameter in the reward function that allows for flexible balancing between distance traveled and order completion time. Finally, we demonstrate the robustness of our DRL model on out-of-sample test instances.

Problem

Research questions and friction points this paper is trying to address.

Dynamic order picking in warehouses with real-time adaptation

Optimizing picker routes using Deep Reinforcement Learning

Balancing distance traveled and order completion time

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Reinforcement Learning optimizes picker routes

Dynamic adaptation to fluctuating order arrivals

Balances distance and time via reward function

🔎 Similar Papers

No similar papers found.