Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge in multi-lane autonomous driving where reinforcement learning struggles to simultaneously satisfy safety constraints and achieve high driving efficiency, this paper proposes a safety-oriented Harmonic Policy Iteration (HPI) framework. Methodologically, it introduces: (1) a novel harmonic gradient mechanism that dynamically fuses safety and efficiency gradients to enable conflict-minimizing policy updates; (2) the first integration of Distributed Soft Actor-Critic (DSAC) with HPI, yielding the end-to-end algorithm DSAC-H; and (3) comprehensive evaluation in a high-fidelity multi-lane simulation environment. Results demonstrate near-zero constraint violation rates, significantly improved training stability, and faster convergence compared to baseline methods including SAC and TD3. This work establishes a scalable, highly robust paradigm for constrained reinforcement learning in autonomous driving.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL), known for its self-evolution capability, offers a promising approach to training high-level autonomous driving systems. However, handling constraints remains a significant challenge for existing RL algorithms, particularly in real-world applications. In this paper, we propose a new safety-oriented training technique called harmonic policy iteration (HPI). At each RL iteration, it first calculates two policy gradients associated with efficient driving and safety constraints, respectively. Then, a harmonic gradient is derived for policy updating, minimizing conflicts between the two gradients and consequently enabling a more balanced and stable training process. Furthermore, we adopt the state-of-the-art DSAC algorithm as the backbone and integrate it with our HPI to develop a new safe RL algorithm, DSAC-H. Extensive simulations in multi-lane scenarios demonstrate that DSAC-H achieves efficient driving performance with near-zero safety constraint violations.
Problem

Research questions and friction points this paper is trying to address.

Handling constraints in RL for autonomous driving
Balancing efficiency and safety in multi-lane scenarios
Minimizing conflicts between policy gradients for stable training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Harmonic policy iteration balances safety and efficiency
DSAC algorithm integrated with harmonic gradient
Minimizes conflicts between driving and safety gradients
🔎 Similar Papers
No similar papers found.
F
Feihong Zhang
School of Vehicle and Mobility, Tsinghua University, Beijing, 100084, China
G
Guojian Zhan
School of Vehicle and Mobility, Tsinghua University, Beijing, 100084, China
Bin Shuai
Bin Shuai
清华大学
Reinforcement LearningAutonomous VehicleOptimal Control
T
Tianyi Zhang
School of Vehicle and Mobility, Tsinghua University, Beijing, 100084, China
Jingliang Duan
Jingliang Duan
University of Science and Technology Beijing
S
Shengbo Eben Li
School of Vehicle and Mobility, Tsinghua University, Beijing, 100084, China; College of Artificial Intelligence, Tsinghua University, Beijing, 100084, China