🤖 AI Summary
This work proposes a multi-agent reinforcement learning framework to address the poor generalization of existing methods under dynamic traffic conditions and the mismatch between conventional action spaces and driver expectations. The approach enhances model generalization through turn-ratio randomization, introduces a stability-oriented exponential phase duration action space, and integrates a neighborhood observation mechanism within a scalable centralized training with decentralized execution architecture based on MAPPO. Evaluated in VISSIM simulations, the method reduces average vehicle waiting time by over 10% and demonstrates superior generalization and control stability in unseen traffic scenarios.
📝 Abstract
Reinforcement Learning (RL) in Traffic Signal Control (TSC) faces significant hurdles in real-world deployment due to limited generalization to dynamic traffic flow variations. Existing approaches often overfit static patterns and use action spaces incompatible with driver expectations. This paper proposes a robust Multi-Agent Reinforcement Learning (MARL) framework validated in the Vissim traffic simulator. The framework integrates three mechanisms: (1) Turning Ratio Randomization, a training strategy that exposes agents to dynamic turning probabilities to enhance robustness against unseen scenarios; (2) a stability-oriented Exponential Phase Duration Adjustment action space, which balances responsiveness and precision through cyclical, exponential phase adjustments; and (3) a Neighbor-Based Observation scheme utilizing the MAPPO algorithm with Centralized Training with Decentralized Execution (CTDE). By leveraging centralized updates, this approach approximates the efficacy of global observations while maintaining scalable local communication. Experimental results demonstrate that our framework outperforms standard RL baselines, reducing average waiting time by over 10%. The proposed model exhibits superior generalization in unseen traffic scenarios and maintains high control stability, offering a practical solution for adaptive signal control.