Multi-Timescale Hierarchical Reinforcement Learning for Unified Behavior and Control of Autonomous Driving

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Existing reinforcement learning (RL) approaches for autonomous driving suffer from structural limitations in policy design: short-horizon control policies are vulnerable to output instability, while long-horizon goal-directed policies struggle to jointly optimize behavioral planning and low-level control. To address this, we propose a multi-timescale hierarchical RL framework: a high-level policy generates long-horizon motion guidance, while a low-level policy produces short-horizon control commands. We introduce a hybrid action representation and an incremental state update mechanism to support multimodal driving behavior modeling. Additionally, multi-scale safety constraints ensure both global trajectory safety and local control safety. Through end-to-end joint training and simulation-based safety optimization, our method achieves significant improvements in driving efficiency, action consistency, and safety on the HighD highway dataset—marking the first RL-based approach to unify behavioral decision-making and low-level control optimization.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning (RL) is increasingly used in autonomous driving (AD) and shows clear advantages. However, most RL-based AD methods overlook policy structure design. An RL policy that only outputs short-timescale vehicle control commands results in fluctuating driving behavior due to fluctuations in network outputs, while one that only outputs long-timescale driving goals cannot achieve unified optimality of driving behavior and control. Therefore, we propose a multi-timescale hierarchical reinforcement learning approach. Our approach adopts a hierarchical policy structure, where high- and low-level RL policies are unified-trained to produce long-timescale motion guidance and short-timescale control commands, respectively. Therein, motion guidance is explicitly represented by hybrid actions to capture multimodal driving behaviors on structured road and support incremental low-level extend-state updates. Additionally, a hierarchical safety mechanism is designed to ensure multi-timescale safety. Evaluation in simulator-based and HighD dataset-based highway multi-lane scenarios demonstrates that our approach significantly improves AD performance, effectively increasing driving efficiency, action consistency and safety.

Problem

Research questions and friction points this paper is trying to address.

RL-based AD lacks policy structure design

Fluctuating behavior from short-timescale control commands

No unified optimality in behavior and control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-timescale hierarchical reinforcement learning approach

Hybrid actions for multimodal driving behaviors

Hierarchical safety mechanism for multi-timescale safety

🔎 Similar Papers

A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving