🤖 AI Summary
Retailers face high warehousing costs and low asset turnover due to backlogged returns.
Method: This paper proposes a real-time redistribution decision mechanism that bypasses centralized warehousing by dynamically assigning returns to stores. It is the first to formulate return management as an online multiple knapsack problem and introduces an end-to-end reinforcement learning framework based on Deep Q-Networks (DQN). The framework incorporates a state encoder integrating product attributes, store inventory levels, and time-sensitive constraints, along with an action space adaptive to dynamic storage capacities.
Contribution/Results: Under stringent concurrent operational constraints, the method jointly optimizes near-optimal revenue and sub-millisecond decision latency. Simulation results show only a 3% revenue gap versus offline benchmarks, while reducing average warehousing time by 96%, significantly improving inventory turnover and warehouse capacity utilization.
📝 Abstract
In retail warehouses, returned products are typically placed in an intermediate storage until a decision regarding further shipment to stores is made. The longer products are held in storage, the higher the inefficiency and costs of the returns management process, since enough storage area has to be provided and maintained while the products are not placed for sale. To reduce the average product storage time, we consider an alternative solution where reallocation decisions for products can be made instantly upon their arrival in the warehouse allowing only a limited number of products to still be stored simultaneously. We transfer the problem to an online multiple knapsack problem and propose a novel reinforcement learning approach to pack the items (products) into the knapsacks (stores) such that the overall value (expected revenue) is maximized. Empirical evaluations on simulated data demonstrate that, compared to the usual offline decision procedure, our approach comes with a performance gap of only 3% while significantly reducing the average storage time of a product by 96%.