Reinforcement Learning-based Sequential Route Recommendation for System-Optimal Traffic Assignment

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Can personalized route recommendations achieve system-optimal (SO) traffic assignment? This paper formulates static SO path assignment as a sequential decision-making problem for a single-agent deep reinforcement learning (DRL) framework, where a central agent dynamically recommends routes in the order of OD demand arrivals to minimize total system travel time. We propose a novel MSA-guided deep Q-learning framework: (i) embedding the iterative structure of the Method of Successive Averages (MSA) into DRL training; (ii) designing an SO-informed action space for routing decisions; and (iii) employing graph neural networks to encode network state and model OD sequences. The method converges exactly to the theoretical SO solution on the Braess network and achieves only 0.35% deviation from SO on the Ortuzar–Willumsen network. Ablation studies demonstrate that the SO-guided action space improves performance by over 40% and accelerates convergence by 2.3×.

Technology Category

Application Category

📝 Abstract
Modern navigation systems and shared mobility platforms increasingly rely on personalized route recommendations to improve individual travel experience and operational efficiency. However, a key question remains: can such sequential, personalized routing decisions collectively lead to system-optimal (SO) traffic assignment? This paper addresses this question by proposing a learning-based framework that reformulates the static SO traffic assignment problem as a single-agent deep reinforcement learning (RL) task. A central agent sequentially recommends routes to travelers as origin-destination (OD) demands arrive, to minimize total system travel time. To enhance learning efficiency and solution quality, we develop an MSA-guided deep Q-learning algorithm that integrates the iterative structure of traditional traffic assignment methods into the RL training process. The proposed approach is evaluated on both the Braess and Ortuzar-Willumsen (OW) networks. Results show that the RL agent converges to the theoretical SO solution in the Braess network and achieves only a 0.35% deviation in the OW network. Further ablation studies demonstrate that the route action set's design significantly impacts convergence speed and final performance, with SO-informed route sets leading to faster learning and better outcomes. This work provides a theoretically grounded and practically relevant approach to bridging individual routing behavior with system-level efficiency through learning-based sequential assignment.
Problem

Research questions and friction points this paper is trying to address.

Can personalized routing achieve system-optimal traffic assignment?
Proposes RL framework to minimize total system travel time.
Evaluates approach on Braess and OW networks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning for system-optimal traffic assignment
MSA-guided deep Q-learning algorithm integration
SO-informed route sets enhance learning efficiency
🔎 Similar Papers
No similar papers found.