Real-Time Integrated Dispatching and Idle Fleet Steering with Deep Reinforcement Learning for A Meal Delivery Platform

📅 2025-01-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the coupled optimization challenge of order assignment, rider dispatching, and supply–demand imbalance in real-time food-delivery platform scheduling, this paper proposes a forward-aware deep reinforcement learning dual-control framework. Methodologically, it introduces the first joint iterative training mechanism for dispatching and steering; designs a convolutional DQN-based fair rider representation that explicitly models individual workload balancing; incorporates mean-field approximation to capture regional supply–demand dynamics for localized rebalancing; and integrates explicit demand forecasting with Markov decision process modeling. Experimental results demonstrate that the framework achieves millisecond-scale decision latency while significantly improving delivery timeliness (12.3% average reduction in delivery time), enhancing rider workload fairness (18.7% reduction in Gini coefficient), and effectively mitigating regional supply–demand imbalances.

Technology Category

Application Category

📝 Abstract
To achieve high service quality and profitability, meal delivery platforms like Uber Eats and Grubhub must strategically operate their fleets to ensure timely deliveries for current orders while mitigating the consequential impacts of suboptimal decisions that leads to courier understaffing in the future. This study set out to solve the real-time order dispatching and idle courier steering problems for a meal delivery platform by proposing a reinforcement learning (RL)-based strategic dual-control framework. To address the inherent sequential nature of these problems, we model both order dispatching and courier steering as Markov Decision Processes. Trained via a deep reinforcement learning (DRL) framework, we obtain strategic policies by leveraging the explicitly predicted demands as part of the inputs. In our dual-control framework, the dispatching and steering policies are iteratively trained in an integrated manner. These forward-looking policies can be executed in real-time and provide decisions while jointly considering the impacts on local and network levels. To enhance dispatching fairness, we propose convolutional deep Q networks to construct fair courier embeddings. To simultaneously rebalance the supply and demand within the service network, we propose to utilize mean-field approximated supply-demand knowledge to reallocate idle couriers at the local level. Utilizing the policies generated by the RL-based strategic dual-control framework, we find the delivery efficiency and fairness of workload distribution among couriers have been improved, and under-supplied conditions have been alleviated within the service network. Our study sheds light on designing an RL-based framework to enable forward-looking real-time operations for meal delivery platforms and other on-demand services.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning
Deep Learning
Vehicle Scheduling Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning
Real-time Scheduling
Deep Learning Training
J
Jingyi Cheng
Transport and Planning, Delft University of Technology, Stevinweg 1, Delft, 2628CN, Zuid Holland, Netherlands
Shadi Sharif Azadeh
Shadi Sharif Azadeh
Associate Professor TU Delft /Research Affiliate EUR, EPFL
Operations ResearchTransportationReinforcement learningReal-time decisionsChoice models