🤖 AI Summary
Reinforcement learning (RL) models for multi-city traffic signal control face challenges in cross-domain transfer due to significant environmental heterogeneity, poor generalizability across cities, and underutilization of multi-source data.
Method: This paper proposes a modular modeling and meta-RL-driven experience aggregation framework. It decouples traffic control into perception, decision-making, and execution modules; leverages meta-RL for cross-city policy initialization; and incorporates a domain adaptation mechanism to align heterogeneous city-state distributions.
Contribution/Results: Evaluated on multiple real-world city-scale simulation platforms (e.g., SUMO + OpenStreetMap), the framework reduces cold-start interaction cost in target cities by 37.2%, decreases average vehicle delay by 19.8%, and demonstrates strong zero-shot generalization to unseen cities. Its core innovations—modular neural control architecture and meta-domain co-adaptation—significantly enhance model deployment efficiency and practical applicability.
📝 Abstract
Traffic signal control (TSC) is an important and widely studied direction. Recently, reinforcement learning (RL) methods have been used to solve TSC problems and achieve superior performance over conventional TSC methods. However, applying RL methods to the real world is challenging due to the huge cost of experiments in real-world traffic environments. One possible solution is TSC domain adaptation, which adapts trained models to target environments and reduces the number of interactions and the training cost. However, existing TSC domain adaptation methods still face two major issues: the lack of consideration for differences across cities and the low utilization of multi-city data. To solve aforementioned issues, we propose an approach named Adaptive Modularized Model (AMM). By modularizing TSC problems and network models, we overcome the challenge of possible changes in environmental observations. We also aggregate multi-city experience through meta-learning. We conduct extensive experiments on different cities and show that AMM can achieve excellent performance with limited interactions in target environments and outperform existing methods. We also demonstrate the feasibility and generalizability of our method.