Omni-scale Learning-based Sequential Decision Framework for Order Fulfillment of Tote-handling Robotic Systems

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
Existing order fulfillment methods struggle to generalize across box-based robotic systems of varying scales and types. This work proposes a unified, scalable sequential decision-making framework that, for the first time, integrates structured combinatorial optimization with multi-agent reinforcement learning to jointly optimize the scheduling of orders, bins, and robots, enabling real-time, cross-scale, and transferable decision-making. Experimental results demonstrate that in small-scale systems, the solution quality deviates from optimality by less than 3.5% on average; in large-scale settings, the approach reduces bin movement by 8–30% compared to heuristic and state-of-the-art rule-based methods, significantly improving energy efficiency and throughput stability.
📝 Abstract
Driven by the rapid expansion of e-commerce and small-batch production, the size of the intralogistics load unit of finished goods, semi-finished goods and raw materials is steadily shrinking. Totes are gradually replacing pallets as the primary handling and storage container. This shift has propelled tote-handling robotic systems to the forefront of automation order fulfillment centers. The order-fulfillment decisions of tote-handling robotic systems share a common order-tote-robot sequential decision-making nature. Existing studies primarily focus on decision mechanisms tailored to particular systems, making it difficult to generalize or transfer them to other contexts. We propose an Omni-scale Learning-based Sequential Decision Framework for Order Fulfillment of Tote-handling Robotic Systems (OLSF-TRS), a generalized and scalable sequential decision framework that combines structured combinatorial optimization with multi-agent reinforcement learning to coordinate order,tote, and robot decisions. On small-scale tote-handling robotic systems, OLSF-TRS achieves near-optimal performance with average optimality gaps below 3.5% across two distinct system configurations. In large-scale scenarios, OLSF-TRS consistently outperforms heuristic baselines across two different system types, reducing total tote movements by 8-12% and over 30% compared to SOTA rule-based approaches, while maintaining real-time responsiveness. These improvements translate into tangible operational benefits, including cost reduction, lower energy consumption, and enhanced throughput stability. The proposed framework delivers an efficient and unified order fulfillment decision-making framework for widely deployed tote-handling robotic systems,supporting high-quality order fulfillment in both e-commerce and industrial logistics sectors.
Problem

Research questions and friction points this paper is trying to address.

order fulfillment
tote-handling robotic systems
sequential decision-making
generalization
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Omni-scale learning
Sequential decision-making
Tote-handling robotic systems
Multi-agent reinforcement learning
Combinatorial optimization
Jiaxin Liu
Jiaxin Liu
School of Vehicle and Mobility, Tsinghua University
autonomous drivingreinforcement learning
Peng Yang
Peng Yang
Tsinghua university
机器人仓储系统;物流设施规划与运作;订单拣选
Y
Yuping Li
Institution of Data and Information, Shenzhen International Graduate School, Tsinghua University, Nanshan District, Shenzhen 518055, China
X
Xinyue Xie
Institution of Data and Information, Shenzhen International Graduate School, Tsinghua University, Nanshan District, Shenzhen 518055, China