Toward Dependency Dynamics in Multi-Agent Reinforcement Learning for Traffic Signal Control

📅 2025-02-23

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

In multi-agent reinforcement learning (MARL) for cooperative traffic signal control, dynamic agent dependency—induced by spillback congestion—hinders convergence to the globally optimal Q-value. Method: We propose a Dependency-Driven Parameter Update Strategy (DPUS), which adaptively switches between centralized and decentralized learning based on whether spillback occurs between intersections, and dynamically updates the diagonal submatrices of the DQN to guarantee theoretical convergence to the optimal Q-value. Results: In dual-intersection simulations, DPUS significantly accelerates convergence while preserving exploration capability. Experiments validate two key theoretical findings: (i) MARL achieves global optimality under spillback-free dependencies; (ii) centralized learning is necessary when spillback-induced dependencies exist. This work pioneers explicit modeling of agent dependency as a trigger for learning architecture switching, establishing a novel paradigm for MARL in dynamically coupled systems.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) emerges as a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, with deep neural networks substantially augmenting its learning capabilities. However, centralized RL becomes impractical for ATSC involving multiple agents due to the exceedingly high dimensionality of the joint action space. Multi-agent RL (MARL) mitigates this scalability issue by decentralizing control to local RL agents. Nevertheless, this decentralized method introduces new challenges: the environment becomes partially observable from the perspective of each local agent due to constrained inter-agent communication. Both centralized RL and MARL exhibit distinct strengths and weaknesses, particularly under heavy intersectional traffic conditions. In this paper, we justify that MARL can achieve the optimal global Q-value by separating into multiple IRL (Independent Reinforcement Learning) processes when no spill-back congestion occurs (no agent dependency) among agents (intersections). In the presence of spill-back congestion (with agent dependency), the maximum global Q-value can be achieved by using centralized RL. Building upon the conclusions, we propose a novel Dynamic Parameter Update Strategy for Deep Q-Network (DQN-DPUS), which updates the weights and bias based on the dependency dynamics among agents, i.e. updating only the diagonal sub-matrices for the scenario without spill-back congestion. We validate the DQN-DPUS in a simple network with two intersections under varying traffic, and show that the proposed strategy can speed up the convergence rate without sacrificing optimal exploration. The results corroborate our theoretical findings, demonstrating the efficacy of DQN-DPUS in optimizing traffic signal control.

Problem

Research questions and friction points this paper is trying to address.

Addresses scalability in multi-agent traffic control.

Optimizes Q-value in independent and dependent scenarios.

Proposes Dynamic Parameter Update Strategy for DQN.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Agent Reinforcement Learning

Dynamic Parameter Update Strategy

Adaptive Traffic Signal Control

🔎 Similar Papers

Towards Multi-agent Policy-based Directed Hypergraph Learning for Traffic Signal Control