🤖 AI Summary
This work addresses the tightly coupled and highly time-sensitive problem of order assignment and robot scheduling in Robotic Mobile Fulfillment Systems (RMFS), where conventional approaches struggle to balance global optimality with low-latency response. The paper proposes SOAR, a novel framework that unifies both tasks into an event-driven Markov Decision Process for the first time, leveraging soft order assignments as observational inputs to enable end-to-end joint optimization under asynchronous event triggers. SOAR integrates a heterogeneous graph Transformer to encode warehouse states, domain-informed reward shaping, and an event-driven execution mechanism. Experiments demonstrate that SOAR reduces makespan by 7.5% and average order completion time by 15.4% on synthetic and real-world industrial datasets, respectively, while maintaining decision latency below 100 ms, with successful sim-to-real deployment validating its practical efficacy.
📝 Abstract
Robotic Mobile Fulfillment Systems (RMFS) rely on mobile robots for automated inventory transportation, coordinating order allocation and robot scheduling to enhance warehousing efficiency. However, optimizing RMFS is challenging due to strict real-time constraints and the strong coupling of multi-phase decisions. Existing methods either decompose the problem into isolated sub-tasks to guarantee responsiveness at the cost of global optimality, or rely on computationally expensive global optimization models that are unsuitable for dynamic industrial environments. To bridge this gap, we propose SOAR, a unified Deep Reinforcement Learning framework for real-time joint optimization. SOAR transforms order allocation and robot scheduling into a unified process by utilizing soft order allocations as observations. We formulate this as an Event-Driven Markov Decision Process, enabling the agent to perform simultaneous scheduling in response to asynchronous system events. Technically, we employ a Heterogeneous Graph Transformer to encode the warehouse state and integrate phased domain knowledge. Additionally, we incorporate a reward shaping strategy to address sparse feedback in long-horizon tasks. Extensive experiments on synthetic and real-world industrial datasets, in collaboration with Geekplus, demonstrate that SOAR reduces global makespan by 7.5\% and average order completion time by 15.4\% with sub-100ms latency. Furthermore, sim-to-real deployment confirms its practical viability and significant performance gains in production environments. The code is available at https://github.com/200815147/SOAR.