🤖 AI Summary
Multi-agent reinforcement learning (MARL) for adaptive traffic signal control (ATSC) suffers from the curse of dimensionality, partial observability, and insufficient inter-agent coordination. Method: This paper proposes a regionalized semi-centralized training with decentralized execution (SCTDE) architecture. It partitions the road network into intersection regions, enforces parameter sharing among agents, employs joint state representation, and designs a composite reward function—balancing global coordination and local adaptability. Crucially, it integrates local observations with lightweight global information encoding, enhancing policy consistency and transferability without imposing online communication overhead. Contribution/Results: Experiments demonstrate that the method significantly outperforms rule-based and fully decentralized baselines across diverse traffic densities and flow distributions. It exhibits strong generalization capability and is compatible with various policy backbone architectures, enabling flexible deployment and scalability.
📝 Abstract
Multi-agent reinforcement learning (MARL) has emerged as a promising paradigm for adaptive traffic signal control (ATSC) of multiple intersections. Existing approaches typically follow either a fully centralized or a fully decentralized design. Fully centralized approaches suffer from the curse of dimensionality, and reliance on a single learning server, whereas purely decentralized approaches operate under severe partial observability and lack explicit coordination resulting in suboptimal performance. These limitations motivate region-based MARL, where the network is partitioned into smaller, tightly coupled intersections that form regions, and training is organized around these regions. This paper introduces a Semi-Centralized Training, Decentralized Execution (SEMI-CTDE) architecture for multi intersection ATSC. Within each region, SEMI-CTDE performs centralized training with regional parameter sharing and employs composite state and reward formulations that jointly encode local and regional information. The architecture is highly transferable across different policy backbones and state-reward instantiations. Building on this architecture, we implement two models with distinct design objectives. A multi-perspective experimental analysis of the two implemented SEMI-CTDE-based models covering ablations of the architecture's core elements including rule based and fully decentralized baselines shows that they achieve consistently superior performance and remain effective across a wide range of traffic densities and distributions.