🤖 AI Summary
This study addresses the limitations of traditional traffic signal control in adapting to dynamic traffic flows and the shortcomings of existing reinforcement learning approaches that rely on delay- or queue-based rewards, which often lead to myopic decisions and unstable training while failing to balance traffic efficiency and carbon emissions. To overcome these challenges, the authors propose a momentum-based reward function (MBRF) that promotes sustained vehicle movement rather than merely penalizing congestion, thereby enabling joint optimization of throughput and emission reduction. Implemented within a deep reinforcement learning framework on the SUMO platform, the proposed method significantly outperforms conventional delay/queue-based reward schemes as well as benchmark strategies such as Max Pressure and Longest Queue First, achieving higher traffic efficiency, lower CO₂ emissions, and notably improved training stability and policy performance.
📝 Abstract
Urban traffic congestion is a growing global issue contributing significantly to long commute times and environmental pollution. Traditional traffic signal control systems often fail to adapt to dynamic traffic conditions. Adaptive traffic signal control can improve urban traffic without changing road infrastructure. Deep Reinforcement Learning (DRL) has shown strong performance for this task, but existing delay and queue-based rewards often produce short-sighted or unstable policies. This paper proposes a Momentum-Based Reward Function (MBRF) that encourages vehicles to keep moving rather than penalizing congestion alone. The method is evaluated in SUMO (Simulation of Urban MObility) using standard traffic metrics such as waiting time, queue length, throughput, and CO2 emissions. Results show that the proposed reward produces better throughput-emission trade-offs and more stable learning behavior than delay or queue-based rewards, as well as classical controllers such as Max Pressure and LQF.