Timing the Match: A Deep Reinforcement Learning Approach for Ride-Hailing and Ride-Pooling Services

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of fixed-time-window matching in ride-hailing and carpooling—namely prolonged passenger wait times, excessive driver deadheading, and poor responsiveness to real-time supply-demand fluctuations—this paper proposes a deep reinforcement learning (PPO)-based method for dynamic matching timing decisions. We formulate the matching trigger time as a sequential decision-making problem for the first time and employ potential-based reward shaping (PBRS) to mitigate sparse reward challenges, enabling end-to-end adaptive temporal control. The method is integrated into a high-fidelity urban traffic simulator trained on real-world data. Empirical evaluation on real-world datasets demonstrates significant reductions in average passenger waiting time and carpool detour delay. System-wide efficiency improves by 23.6% over the optimal fixed-interval baseline, establishing a scalable, intelligent temporal optimization paradigm for real-time supply-demand matching.

Technology Category

Application Category

📝 Abstract
Efficient timing in ride-matching is crucial for improving the performance of ride-hailing and ride-pooling services, as it determines the number of drivers and passengers considered in each matching process. Traditional batched matching methods often use fixed time intervals to accumulate ride requests before assigning matches. While this approach increases the number of available drivers and passengers for matching, it fails to adapt to real-time supply-demand fluctuations, often leading to longer passenger wait times and driver idle periods. To address this limitation, we propose an adaptive ride-matching strategy using deep reinforcement learning (RL) to dynamically determine when to perform matches based on real-time system conditions. Unlike fixed-interval approaches, our method continuously evaluates system states and executes matching at moments that minimize total passenger wait time. Additionally, we incorporate a potential-based reward shaping (PBRS) mechanism to mitigate sparse rewards, accelerating RL training and improving decision quality. Extensive empirical evaluations using a realistic simulator trained on real-world data demonstrate that our approach outperforms fixed-interval matching strategies, significantly reducing passenger waiting times and detour delays, thereby enhancing the overall efficiency of ride-hailing and ride-pooling systems.
Problem

Research questions and friction points this paper is trying to address.

Adaptive ride-matching for real-time supply-demand fluctuations.
Minimizing passenger wait time using deep reinforcement learning.
Improving ride-hailing and ride-pooling efficiency with dynamic matching.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep reinforcement learning for adaptive ride-matching
Dynamic matching based on real-time system conditions
Potential-based reward shaping to accelerate RL training
Y
Yiman Bao
Delft University of Technology, Delft, The Netherlands, the department of Transport & Planning
J
Jie Gao
Delft University of Technology, Delft, The Netherlands, the department of Transport & Planning
Jinke He
Jinke He
Wayve
LearningPlanningEmbodied AI
F
F. Oliehoek
Delft University of Technology, Delft, The Netherlands, the department of Intelligent Systems
O
O. Cats
Delft University of Technology, Delft, The Netherlands, the department of Transport & Planning