🤖 AI Summary
To address the poor generalizability and limited interpretability of deep reinforcement learning (DRL) controllers in complex traffic scenarios—particularly under long-horizon decision-making, sparse rewards, and multi-agent interactions—this paper proposes a hierarchical reinforcement learning (HRL) framework that decouples control into high-level strategic planning and low-level motion control. A novel two-stage decoupled training mechanism is introduced: the high-level policy optimizes long-term delayed rewards, while the low-level controller executes precise, real-time longitudinal and lateral maneuvers. This architectural separation significantly enhances policy interpretability and environmental adaptability. Evaluated in highway simulation environments, the proposed HRL framework achieves a 37% improvement in task completion rate and a 52% increase in long-distance lane-changing success rate over standard single-layer DRL baselines. Moreover, it demonstrates markedly improved robustness to sparse reward signals and dynamic multi-agent interactions.
📝 Abstract
Developing an automated driving system capable of navigating complex traffic environments remains a formidable challenge. Unlike rule-based or supervised learning-based methods, Deep Reinforcement Learning (DRL) based controllers eliminate the need for domain-specific knowledge and datasets, thus providing adaptability to various scenarios. Nonetheless, a common limitation of existing studies on DRL-based controllers is their focus on driving scenarios with simple traffic patterns, which hinders their capability to effectively handle complex driving environments with delayed, long-term rewards, thus compromising the generalizability of their findings. In response to these limitations, our research introduces a pioneering hierarchical framework that efficiently decomposes intricate decision-making problems into manageable and interpretable subtasks. We adopt a two step training process that trains the high-level controller and low-level controller separately. The high-level controller exhibits an enhanced exploration potential with long-term delayed rewards, and the low-level controller provides longitudinal and lateral control ability using short-term instantaneous rewards. Through simulation experiments, we demonstrate the superiority of our hierarchical controller in managing complex highway driving situations.