🤖 AI Summary
To address the poor scalability and unstable convergence of multi-agent reinforcement learning (MARL) in large-scale urban traffic assignment, this paper proposes MARL-OD-DA—a novel MARL framework where origin-destination (OD) pairs serve as autonomous agents. We introduce an innovative OD-router agent paradigm, design a Dirichlet-distribution-based action space with dynamic action pruning, and formulate a local relative gap reward function to enhance solution reliability and accelerate convergence. Experiments on the Sioux Falls network demonstrate that MARL-OD-DA converges within only 10 training episodes and reduces the relative gap by 94.99% compared to conventional methods—significantly outperforming existing MARL approaches. By shifting from individual-traveler to OD-pair-level modeling, our framework overcomes the scalability bottleneck inherent in fine-grained agent representations. This work establishes a new, efficient, and scalable multi-agent paradigm for large-scale traffic assignment.
📝 Abstract
The evolution of metropolitan cities and the increase in travel demands impose stringent requirements on traffic assignment methods. Multi-agent reinforcement learning (MARL) approaches outperform traditional methods in modeling adaptive routing behavior without requiring explicit system dynamics, which is beneficial for real-world deployment. However, MARL frameworks face challenges in scalability and reliability when managing extensive networks with substantial travel demand, which limiting their practical applicability in solving large-scale traffic assignment problems. To address these challenges, this study introduces MARL-OD-DA, a new MARL framework for the traffic assignment problem, which redefines agents as origin-destination (OD) pair routers rather than individual travelers, significantly enhancing scalability. Additionally, a Dirichlet-based action space with action pruning and a reward function based on the local relative gap are designed to enhance solution reliability and improve convergence efficiency. Experiments demonstrate that the proposed MARL framework effectively handles medium-sized networks with extensive and varied city-level OD demand, surpassing existing MARL methods. When implemented in the SiouxFalls network, MARL-OD-DA achieves better assignment solutions in 10 steps, with a relative gap that is 94.99% lower than that of conventional methods.