Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

๐Ÿ“… 2025-06-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing reinforcement learning (RL) approaches for autonomous driving suffer from structural limitations in policy design: short-horizon control policies are vulnerable to output instability, while long-horizon goal-directed policies struggle to jointly optimize behavioral planning and low-level control. To address this, we propose a multi-timescale hierarchical RL framework: a high-level policy generates long-horizon motion guidance, while a low-level policy produces short-horizon control commands. We introduce a hybrid action representation and an incremental state update mechanism to support multimodal driving behavior modeling. Additionally, multi-scale safety constraints ensure both global trajectory safety and local control safety. Through end-to-end joint training and simulation-based safety optimization, our method achieves significant improvements in driving efficiency, action consistency, and safety on the HighD highway datasetโ€”marking the first RL-based approach to unify behavioral decision-making and low-level control optimization.

Technology Category

Application Category

๐Ÿ“ Abstract
Reinforcement Learning (RL) is increasingly used in autonomous driving (AD) and shows clear advantages. However, most RL-based AD methods overlook policy structure design. An RL policy that only outputs short-timescale vehicle control commands results in fluctuating driving behavior due to fluctuations in network outputs, while one that only outputs long-timescale driving goals cannot achieve unified optimality of driving behavior and control. Therefore, we propose a multi-timescale hierarchical reinforcement learning approach. Our approach adopts a hierarchical policy structure, where high- and low-level RL policies are unified-trained to produce long-timescale motion guidance and short-timescale control commands, respectively. Therein, motion guidance is explicitly represented by hybrid actions to capture multimodal driving behaviors on structured road and support incremental low-level extend-state updates. Additionally, a hierarchical safety mechanism is designed to ensure multi-timescale safety. Evaluation in simulator-based and HighD dataset-based highway multi-lane scenarios demonstrates that our approach significantly improves AD performance, effectively increasing driving efficiency, action consistency and safety.
Problem

Research questions and friction points this paper is trying to address.

RL-based AD lacks policy structure design
Fluctuating behavior from short-timescale control commands
No unified optimality in behavior and control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-timescale hierarchical reinforcement learning approach
Hybrid actions for multimodal driving behaviors
Hierarchical safety mechanism for multi-timescale safety
๐Ÿ”Ž Similar Papers
No similar papers found.
G
Guizhe Jin
School of Automotive Studies, Tongji University, Shanghai 201804, China
Zhuoren Li
Zhuoren Li
Ph.D. Candidate
autonomous vehiclesintelligent transportationmotion planningreinforcement learning
B
Bo Leng
School of Automotive Studies, Tongji University, Shanghai 201804, China
Ran Yu
Ran Yu
GESIS โ€“ Leibniz Institute for the Social Sciences
knowledge graphsearch as learninginformation retrievalspatial-temporal data analysis
L
Lu Xiong
School of Automotive Studies, Tongji University, Shanghai 201804, China