🤖 AI Summary
Existing multi-agent reinforcement learning (MARL) frameworks for large-scale regional adaptive traffic signal control suffer from poor scalability and misalignment with real-world centralized traffic management systems.
Method: This paper proposes a single-agent reinforcement learning approach, wherein a centralized controller jointly optimizes signal timings across multiple intersections. Crucially, probe vehicle travel times are leveraged to reliably estimate inbound lane queue lengths, enabling a lightweight, deployable state–action–reward formulation. The design deliberately abandons distributed MARL architectures to conform to the operational reality of centralized traffic control.
Results: Evaluated in SUMO simulations, the method significantly reduces both average network-wide delay and total queue length. It demonstrates strong efficacy in alleviating congestion across large-scale road networks and exhibits high potential for practical engineering deployment.
📝 Abstract
Several studies have employed reinforcement learning (RL) to address the challenges of regional adaptive traffic signal control (ATSC) and achieved promising results. In this field, existing research predominantly adopts multi-agent frameworks. However, the adoption of multi-agent frameworks presents challenges for scalability. Instead, the Traffic signal control (TSC) problem necessitates a single-agent framework. TSC inherently relies on centralized management by a single control center, which can monitor traffic conditions across all roads in the study area and coordinate the control of all intersections. This work proposes a single-agent RL-based regional ATSC model compatible with probe vehicle technology. Key components of the RL design include state, action, and reward function definitions. To facilitate learning and manage congestion, both state and reward functions are defined based on queue length, with action designed to regulate queue dynamics. The queue length definition used in this study differs slightly from conventional definitions but is closely correlated with congestion states. More importantly, it allows for reliable estimation using link travel time data from probe vehicles. With probe vehicle data already covering most urban roads, this feature enhances the proposed method's potential for widespread deployment. The method was comprehensively evaluated using the SUMO simulation platform. Experimental results demonstrate that the proposed model effectively mitigates large-scale regional congestion levels via coordinated multi-intersection control.