Deep Reinforcement Learning for Dynamic Origin-Destination Matrix Estimation in Microscopic Traffic Simulations Considering Credit Assignment

๐Ÿ“… 2025-11-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Dynamic Origin-Destination (OD) matrix estimation (DODE) poses a fundamental challenge in microscopic traffic simulation: path uncertainty induces ambiguous credit assignment between OD flows and observed link volumes. This paper reformulates DODE as a Markov decision process (MDP) and proposes a model-free, deep reinforcement learningโ€“based sequential decision-making framework, enabling dynamic, interactive policy training within the SUMO simulation environment. Unlike conventional approaches relying on explicit, static flow mapping functions, our method directly optimizes the OD matrix to minimize the discrepancy between simulated and observed link flows. Experiments on the Nguyen-Dupuis network demonstrate that our approach reduces mean squared error by 43.2% relative to the optimal baseline, significantly improving both calibration accuracy and robustness of microscopic traffic simulations.

Technology Category

Application Category

๐Ÿ“ Abstract
This paper focuses on dynamic origin-destination matrix estimation (DODE), a crucial calibration process necessary for the effective application of microscopic traffic simulations. The fundamental challenge of the DODE problem in microscopic simulations stems from the complex temporal dynamics and inherent uncertainty of individual vehicle dynamics. This makes it highly challenging to precisely determine which vehicle traverses which link at any given moment, resulting in intricate and often ambiguous relationships between origin-destination (OD) matrices and their contributions to resultant link flows. This phenomenon constitutes the credit assignment problem, a central challenge addressed in this study. We formulate the DODE problem as a Markov Decision Process (MDP) and propose a novel framework that applies model-free deep reinforcement learning (DRL). Within our proposed framework, the agent learns an optimal policy to sequentially generate OD matrices, refining its strategy through direct interaction with the simulation environment. The proposed method is validated on the Nguyen-Dupuis network using SUMO, where its performance is evaluated against ground-truth link flows aggregated at 5-minute intervals over a 30-minute horizon. Experimental results demonstrate that our approach achieves a 43.2% reduction in mean squared error (MSE) compared to the best-performing conventional baseline. By reframing DODE as a sequential decision-making problem, our approach addresses the credit assignment challenge through its learned policy, thereby overcoming the limitations of conventional methods and proposing a novel framework for calibration of microscopic traffic simulations.
Problem

Research questions and friction points this paper is trying to address.

Estimating dynamic origin-destination matrices in microscopic traffic simulations
Addressing credit assignment challenges between OD matrices and link flows
Developing deep reinforcement learning framework for sequential OD matrix generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Applies deep reinforcement learning to traffic simulation
Formulates OD estimation as Markov Decision Process
Learns optimal policy through simulation environment interaction
๐Ÿ”Ž Similar Papers
No similar papers found.
D
Donggyu Min
Department of Civil and Environmental Engineering, Seoul National University, Seoul 08826, Republic of Korea
S
Seongjin Choi
Department of Civil, Environmental, and Geo-Engineering, University of Minnesota, Minneapolis, MN 55455, USA
Dong-Kyu Kim
Dong-Kyu Kim
Seoul National University