🤖 AI Summary
Existing order fulfillment methods struggle to generalize across box-based robotic systems of varying scales and types. This work proposes a unified, scalable sequential decision-making framework that, for the first time, integrates structured combinatorial optimization with multi-agent reinforcement learning to jointly optimize the scheduling of orders, bins, and robots, enabling real-time, cross-scale, and transferable decision-making. Experimental results demonstrate that in small-scale systems, the solution quality deviates from optimality by less than 3.5% on average; in large-scale settings, the approach reduces bin movement by 8–30% compared to heuristic and state-of-the-art rule-based methods, significantly improving energy efficiency and throughput stability.
📝 Abstract
Driven by the rapid expansion of e-commerce and small-batch production, the size of the intralogistics load unit of finished goods, semi-finished goods and raw materials is steadily shrinking. Totes are gradually replacing pallets as the primary handling and storage container. This shift has propelled tote-handling robotic systems to the forefront of automation order fulfillment centers. The order-fulfillment decisions of tote-handling robotic systems share a common order-tote-robot sequential decision-making nature. Existing studies primarily focus on decision mechanisms tailored to particular systems, making it difficult to generalize or transfer them to other contexts. We propose an Omni-scale Learning-based Sequential Decision Framework for Order Fulfillment of Tote-handling Robotic Systems (OLSF-TRS), a generalized and scalable sequential decision framework that combines structured combinatorial optimization with multi-agent reinforcement learning to coordinate order,tote, and robot decisions. On small-scale tote-handling robotic systems, OLSF-TRS achieves near-optimal performance with average optimality gaps below 3.5% across two distinct system configurations. In large-scale scenarios, OLSF-TRS consistently outperforms heuristic baselines across two different system types, reducing total tote movements by 8-12% and over 30% compared to SOTA rule-based approaches, while maintaining real-time responsiveness. These improvements translate into tangible operational benefits, including cost reduction, lower energy consumption, and enhanced throughput stability. The proposed framework delivers an efficient and unified order fulfillment decision-making framework for widely deployed tote-handling robotic systems,supporting high-quality order fulfillment in both e-commerce and industrial logistics sectors.