DreamerV3 for Traffic Signal Control: Hyperparameter Tuning and Performance

πŸ“… 2025-03-04
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the high sample complexity and low training efficiency of reinforcement learning (RL) in traffic signal control (TSC), this paper introduces DreamerV3β€”a world-model-based RL frameworkβ€”to TSC for the first time. The agent is trained in SUMO microsimulation using queue-length-based state representations and reward signals, targeting corridor-level control. Key contributions include: (1) systematic validation of DreamerV3’s effectiveness and generalization capability in TSC; (2) empirical identification that small-scale models combined with moderate imagination rollout ratios accelerate hyperparameter tuning by over 50%; (3) zero-shot transfer across origin-destination demand patterns without retraining or adaptation; and (4) discovery that larger models yield only marginal gains in data efficiency, while excessively high imagination ratios impede convergence. These findings establish a new paradigm for designing sample-efficient, generalizable TSC agents.

Technology Category

Application Category

πŸ“ Abstract
Reinforcement learning (RL) has evolved into a widely investigated technology for the development of smart TSC strategies. However, current RL algorithms necessitate excessive interaction with the environment to learn effective policies, making them impractical for large-scale tasks. The DreamerV3 algorithm presents compelling properties for policy learning. It summarizes general dynamics knowledge about the environment and enables the prediction of future outcomes of potential actions from past experience, reducing the interaction with the environment through imagination training. In this paper, a corridor TSC model is trained using the DreamerV3 algorithm to explore the benefits of world models for TSC strategy learning. In RL environment design, to manage congestion levels effectively, both the state and reward functions are defined based on queue length, and the action is designed to manage queue length efficiently. Using the SUMO simulation platform, the two hyperparameters (training ratio and model size) of the DreamerV3 algorithm were tuned and analyzed across different OD matrix scenarios. We discovered that choosing a smaller model size and initially attempting several medium training ratios can significantly reduce the time spent on hyperparameter tuning. Additionally, we found that the approach is generally applicable as it can solve two TSC task scenarios with the same hyperparameters. Regarding the claimed data-efficiency of the DreamerV3 algorithm, due to the significant fluctuation of the episode reward curve in the early stages of training, it can only be confirmed that larger model sizes exhibit modest data-efficiency, and no evidence was found that increasing the training ratio accelerates convergence.
Problem

Research questions and friction points this paper is trying to address.

Reduces environment interaction in RL for traffic signal control.
Explores DreamerV3's benefits for TSC strategy learning.
Tunes hyperparameters to improve efficiency in TSC tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

DreamerV3 algorithm reduces environment interaction
World models enhance traffic signal control learning
Hyperparameter tuning optimizes model size and training
πŸ”Ž Similar Papers
No similar papers found.
Q
Qiang Li
College of Urban Transportation and Logistics, Shenzhen Technology University, Shenzhen, Guangdong 518118, China
Y
Yinhan Lin
College of Urban Transportation and Logistics, Shenzhen Technology University, Shenzhen, Guangdong 518118, China
Qin Luo
Qin Luo
The Chinese University of Hong Kong
VLSI CADOptimizationMachine LearningModel Compression
Lina Yu
Lina Yu
Shenzhen Technology University
Humanitarian logisticsResource allocationDynamic programmingReinforcement learning