๐ค AI Summary
Existing reinforcement learning (RL) approaches for autonomous driving suffer from structural limitations in policy design: short-horizon control policies are vulnerable to output instability, while long-horizon goal-directed policies struggle to jointly optimize behavioral planning and low-level control. To address this, we propose a multi-timescale hierarchical RL framework: a high-level policy generates long-horizon motion guidance, while a low-level policy produces short-horizon control commands. We introduce a hybrid action representation and an incremental state update mechanism to support multimodal driving behavior modeling. Additionally, multi-scale safety constraints ensure both global trajectory safety and local control safety. Through end-to-end joint training and simulation-based safety optimization, our method achieves significant improvements in driving efficiency, action consistency, and safety on the HighD highway datasetโmarking the first RL-based approach to unify behavioral decision-making and low-level control optimization.
๐ Abstract
Reinforcement Learning (RL) is increasingly used in autonomous driving (AD) and shows clear advantages. However, most RL-based AD methods overlook policy structure design. An RL policy that only outputs short-timescale vehicle control commands results in fluctuating driving behavior due to fluctuations in network outputs, while one that only outputs long-timescale driving goals cannot achieve unified optimality of driving behavior and control. Therefore, we propose a multi-timescale hierarchical reinforcement learning approach. Our approach adopts a hierarchical policy structure, where high- and low-level RL policies are unified-trained to produce long-timescale motion guidance and short-timescale control commands, respectively. Therein, motion guidance is explicitly represented by hybrid actions to capture multimodal driving behaviors on structured road and support incremental low-level extend-state updates. Additionally, a hierarchical safety mechanism is designed to ensure multi-timescale safety. Evaluation in simulator-based and HighD dataset-based highway multi-lane scenarios demonstrates that our approach significantly improves AD performance, effectively increasing driving efficiency, action consistency and safety.