RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing

📅 2025-12-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of jointly optimizing passenger waiting time and system efficiency under high supply-demand uncertainty in ride-hailing platforms, this paper formulates adaptive delayed matching as a state-aware, regime-aware Markov decision process that integrates spatiotemporal dynamics with traffic physics. We propose a self-attention-driven sparse Mixture-of-Experts (MoE) encoder enabling automatic expert specialization, coupled with a physics-constrained congestion surrogate model and an adaptive reward mechanism. Evaluated on real-world Uber trajectory data from San Francisco, our method achieves over 13% higher total reward compared to strong baselines, reduces matching and pickup delays by 10% and 15%, respectively, and attains state-of-the-art performance with only 12M parameters. The approach demonstrates strong cross-scenario robustness and training stability.

Technology Category

Application Category

📝 Abstract
Ride-hailing platforms face the challenge of balancing passenger waiting times with overall system efficiency under highly uncertain supply-demand conditions. Adaptive delayed matching creates a trade-off between matching and pickup delays by deciding whether to assign drivers immediately or batch requests. Since outcomes accumulate over long horizons with stochastic dynamics, reinforcement learning (RL) is a suitable framework. However, existing approaches often oversimplify traffic dynamics or use shallow encoders that miss complex spatiotemporal patterns. We introduce the Regime-Aware Spatio-Temporal Mixture-of-Experts (RAST-MoE), which formalizes adaptive delayed matching as a regime-aware MDP equipped with a self-attention MoE encoder. Unlike monolithic networks, our experts specialize automatically, improving representation capacity while maintaining computational efficiency. A physics-informed congestion surrogate preserves realistic density-speed feedback, enabling millions of efficient rollouts, while an adaptive reward scheme guards against pathological strategies. With only 12M parameters, our framework outperforms strong baselines. On real-world Uber trajectory data (San Francisco), it improves total reward by over 13%, reducing average matching and pickup delays by 10% and 15% respectively. It demonstrates robustness across unseen demand regimes and stable training. These findings highlight the potential of MoE-enhanced RL for large-scale decision-making with complex spatiotemporal dynamics.
Problem

Research questions and friction points this paper is trying to address.

Balancing passenger waiting times with system efficiency under uncertain supply-demand
Formalizing adaptive delayed matching as a regime-aware MDP with spatiotemporal patterns
Improving representation capacity and computational efficiency for ride-hailing reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Regime-aware MDP with self-attention MoE encoder
Physics-informed congestion surrogate for efficient rollouts
Adaptive reward scheme to prevent pathological strategies
🔎 Similar Papers
No similar papers found.
Y
Yuhan Tang
Massachusetts Institute of Technology
K
Kangxin Cui
Massachusetts Institute of Technology
Jung Ho Park
Jung Ho Park
University of California, Berkeley
Y
Yibo Zhao
Massachusetts Institute of Technology
Xuan Jiang
Xuan Jiang
PhD @ UC Berkeley, Research Affiliate@ MIT, SWE @Google, ex-Student Researcher@ LBNL
High Performance ComputingArtifitial IntelligenceLarge Language ModelPost-TrainingRL
Haoze He
Haoze He
Carnegie Mellon University
D
Dingyi Zhuang
Massachusetts Institute of Technology
Shenhao Wang
Shenhao Wang
University of Florida; Massachusetts Institute of Technology
Urban AIComputational Social ScienceTravel BehaviorUrban SystemsResilience
Jiangbo Yu
Jiangbo Yu
University of Washington
H
Haris Koutsopoulos
Northeastern University
J
Jinhua Zhao
Massachusetts Institute of Technology