🤖 AI Summary
Traditional traffic signal control (TSC) relies heavily on handcrafted features and heuristic rules, resulting in poor generalizability across diverse traffic scenarios. To address this, we propose TrafficDojo—the first end-to-end, vision-based TSC framework. TrafficDojo integrates SUMO’s microscopic traffic simulation with MetaDrive’s driving simulator to establish a reproducible, scalable, full-stack vision-driven benchmark, enabling direct learning of optimal signal policies from raw camera video streams. Departing from predefined features, it jointly models perception and control via computer vision and reinforcement learning, unifying classical approaches and state-of-the-art RL algorithms into a coherent baseline suite. We open-source all code and benchmark datasets. Experiments demonstrate significant improvements in environmental adaptability and cross-scenario generalization across multiple intersections, substantially reducing vehicle delay and CO₂ emissions.
📝 Abstract
Traffic signal control (TSC) is crucial for reducing traffic congestion that leads to smoother traffic flow, reduced idling time, and mitigated CO2 emissions. In this study, we explore the computer vision approach for TSC that modulates on-road traffic flows through visual observation. Unlike traditional feature-based approaches, vision-based methods depend much less on heuristics and predefined features, bringing promising potentials for end-to-end learning and optimization of traffic signals. Thus, we introduce a holistic traffic simulation framework called TrafficDojo towards vision-based TSC and its benchmarking by integrating the microscopic traffic flow provided in SUMO into the driving simulator MetaDrive. This proposed framework offers a versatile traffic environment for in-depth analysis and comprehensive evaluation of traffic signal controllers across diverse traffic conditions and scenarios. We establish and compare baseline algorithms including both traditional and Reinforecment Learning (RL) approaches. This work sheds insights into the design and development of vision-based TSC approaches and open up new research opportunities. All the code and baselines will be made publicly available.