🤖 AI Summary
To address the limitations of conventional fixed-time and actuated traffic signal control in adapting to dynamic traffic flows, this paper proposes a reinforcement learning (RL)-based adaptive traffic signal control framework that minimizes the total queue length across all signal phases. Methodologically, we design a lightweight yet expressive multi-dimensional state representation—integrating an extended state space, an autoencoder, and K-Planes-inspired feature encoding—and optimize the control policy using the Proximal Policy Optimization (PPO) algorithm. Training and evaluation are conducted in the SUMO simulation environment with a queue-length-driven sparse reward function. Experimental results show that, under optimal configuration, our approach reduces average queue length by 29% compared to the Webster method and significantly outperforms existing RL-based baselines. The key contributions are: (1) an efficient, low-overhead state representation mechanism; and (2) a robust, generalizable PPO training paradigm tailored for real-world deployment.
📝 Abstract
Efficient traffic signal control (TSC) is crucial for reducing congestion, travel delays, pollution, and for ensuring road safety. Traditional approaches, such as fixed signal control and actuated control, often struggle to handle dynamic traffic patterns. In this study, we propose a novel adaptive TSC framework that leverages Reinforcement Learning (RL), using the Proximal Policy Optimization (PPO) algorithm, to minimize total queue lengths across all signal phases. The challenge of efficiently representing highly stochastic traffic conditions for an RL controller is addressed through multiple state representations, including an expanded state space, an autoencoder representation, and a K-Planes-inspired representation. The proposed algorithm has been implemented using the Simulation of Urban Mobility (SUMO) traffic simulator and demonstrates superior performance over both traditional methods and other conventional RL-based approaches in reducing queue lengths. The best performing configuration achieves an approximately 29% reduction in average queue lengths compared to the traditional Webster method. Furthermore, comparative evaluation of alternative reward formulations demonstrates the effectiveness of the proposed queue-based approach, showcasing the potential for scalable and adaptive urban traffic management.