Non-myopic Matching and Rebalancing in Large-Scale On-Demand Ride-Pooling Systems Using Simulation-Informed Reinforcement Learning

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

To address myopic scheduling decisions in large-scale on-demand ride-pooling systems—which optimize only immediate matchings while neglecting long-term vehicle distribution and demand dynamics—this paper proposes a simulation-augmented non-myopic reinforcement learning (RL) framework. It is the first to embed a high-fidelity ride-pooling simulator into the RL training loop to enable accurate long-horizon reward evaluation. Methodologically, we design an n-step temporal difference learning scheme that jointly optimizes passenger matching and vacant-vehicle rebalancing policies, and learn spatiotemporal state-value functions using real taxi request data from New York City. Experiments demonstrate that, compared to myopic baselines, our approach improves service rate by 8.4%, reduces average waiting and in-vehicle travel times, and enables fleet size reduction by over 25%. With coordinated rebalancing, service rate further increases by 15.1% and waiting time decreases by 27.3%.

Technology Category

Application Category

📝 Abstract

Ride-pooling, also known as ride-sharing, shared ride-hailing, or microtransit, is a service wherein passengers share rides. This service can reduce costs for both passengers and operators and reduce congestion and environmental impacts. A key limitation, however, is its myopic decision-making, which overlooks long-term effects of dispatch decisions. To address this, we propose a simulation-informed reinforcement learning (RL) approach. While RL has been widely studied in the context of ride-hailing systems, its application in ride-pooling systems has been less explored. In this study, we extend the learning and planning framework of Xu et al. (2018) from ride-hailing to ride-pooling by embedding a ride-pooling simulation within the learning mechanism to enable non-myopic decision-making. In addition, we propose a complementary policy for rebalancing idle vehicles. By employing n-step temporal difference learning on simulated experiences, we derive spatiotemporal state values and subsequently evaluate the effectiveness of the non-myopic policy using NYC taxi request data. Results demonstrate that the non-myopic policy for matching can increase the service rate by up to 8.4% versus a myopic policy while reducing both in-vehicle and wait times for passengers. Furthermore, the proposed non-myopic policy can decrease fleet size by over 25% compared to a myopic policy, while maintaining the same level of performance, thereby offering significant cost savings for operators. Incorporating rebalancing operations into the proposed framework cuts wait time by up to 27.3%, in-vehicle time by 12.5%, and raises service rate by 15.1% compared to using the framework for matching decisions alone at the cost of increased vehicle minutes traveled per passenger.

Problem

Research questions and friction points this paper is trying to address.

Addressing myopic decision-making in large-scale ride-pooling systems

Developing non-myopic matching policies using simulation-informed reinforcement learning

Proposing vehicle rebalancing strategies to improve system efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulation-informed reinforcement learning for non-myopic ride-pooling

Embedding ride-pooling simulation into learning mechanism

Complementary rebalancing policy for idle vehicles

🔎 Similar Papers

A Reinforcement Learning Approach for Dynamic Rebalancing in Bike-Sharing System