Deep Reinforcement Learning for Dynamic Origin-Destination Matrix Estimation in Microscopic Traffic Simulations Considering Credit Assignment

📅 2025-11-09

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Dynamic Origin-Destination (OD) matrix estimation (DODE) poses a fundamental challenge in microscopic traffic simulation: path uncertainty induces ambiguous credit assignment between OD flows and observed link volumes. This paper reformulates DODE as a Markov decision process (MDP) and proposes a model-free, deep reinforcement learning–based sequential decision-making framework, enabling dynamic, interactive policy training within the SUMO simulation environment. Unlike conventional approaches relying on explicit, static flow mapping functions, our method directly optimizes the OD matrix to minimize the discrepancy between simulated and observed link flows. Experiments on the Nguyen-Dupuis network demonstrate that our approach reduces mean squared error by 43.2% relative to the optimal baseline, significantly improving both calibration accuracy and robustness of microscopic traffic simulations.

Technology Category

Application Category

📝 Abstract

This paper focuses on dynamic origin-destination matrix estimation (DODE), a crucial calibration process necessary for the effective application of microscopic traffic simulations. The fundamental challenge of the DODE problem in microscopic simulations stems from the complex temporal dynamics and inherent uncertainty of individual vehicle dynamics. This makes it highly challenging to precisely determine which vehicle traverses which link at any given moment, resulting in intricate and often ambiguous relationships between origin-destination (OD) matrices and their contributions to resultant link flows. This phenomenon constitutes the credit assignment problem, a central challenge addressed in this study. We formulate the DODE problem as a Markov Decision Process (MDP) and propose a novel framework that applies model-free deep reinforcement learning (DRL). Within our proposed framework, the agent learns an optimal policy to sequentially generate OD matrices, refining its strategy through direct interaction with the simulation environment. The proposed method is validated on the Nguyen-Dupuis network using SUMO, where its performance is evaluated against ground-truth link flows aggregated at 5-minute intervals over a 30-minute horizon. Experimental results demonstrate that our approach achieves a 43.2% reduction in mean squared error (MSE) compared to the best-performing conventional baseline. By reframing DODE as a sequential decision-making problem, our approach addresses the credit assignment challenge through its learned policy, thereby overcoming the limitations of conventional methods and proposing a novel framework for calibration of microscopic traffic simulations.

Problem

Research questions and friction points this paper is trying to address.

Estimating dynamic origin-destination matrices in microscopic traffic simulations

Addressing credit assignment challenges between OD matrices and link flows

Developing deep reinforcement learning framework for sequential OD matrix generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Applies deep reinforcement learning to traffic simulation

Formulates OD estimation as Markov Decision Process

Learns optimal policy through simulation environment interaction

🔎 Similar Papers

Deep Reinforcement Learning for Dynamic Order Picking in Warehouse Operations