🤖 AI Summary
Existing traffic signal control methods based on reinforcement learning predominantly optimize vehicle flow while neglecting pedestrian needs and safety. This paper proposes the first single-agent deep reinforcement learning framework—built upon Proximal Policy Optimization (PPO)—designed explicitly for joint vehicle-pedestrian optimization. Leveraging real-world Wi-Fi logs and video analytics, we construct a multi-source traffic flow model and deploy dynamic, coordinated signal control at an urban eight-intersection arterial corridor. Our key innovations include a unified state representation capturing both vehicular and pedestrian dynamics, and a fairness-aware reward function that jointly optimizes traffic efficiency, pedestrian safety, and generalizability across heterogeneous traffic regimes. Experiments demonstrate that, compared to fixed-time signaling, our approach reduces average pedestrian waiting time by 67% and vehicle waiting time by 52%, while decreasing total cumulative waiting time by 67% and 53%, respectively. These results confirm robustness and scalability under unseen traffic conditions.
📝 Abstract
Reinforcement learning (RL) holds significant promise for adaptive traffic signal control. While existing RL-based methods demonstrate effectiveness in reducing vehicular congestion, their predominant focus on vehicle-centric optimization leaves pedestrian mobility needs and safety challenges unaddressed. In this paper, we present a deep RL framework for adaptive control of eight traffic signals along a real-world urban corridor, jointly optimizing both pedestrian and vehicular efficiency. Our single-agent policy is trained using real-world pedestrian and vehicle demand data derived from Wi-Fi logs and video analysis. The results demonstrate significant performance improvements over traditional fixed-time signals, reducing average wait times per pedestrian and per vehicle by up to 67% and 52%, respectively, while simultaneously decreasing total accumulated wait times for both groups by up to 67% and 53%. Additionally, our results demonstrate generalization capabilities across varying traffic demands, including conditions entirely unseen during training, validating RL's potential for developing transportation systems that serve all road users.