Distributional Soft Actor-Critic with Harmonic Gradient for Safe and Efficient Autonomous Driving in Multi-lane Scenarios

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

To address the challenge in multi-lane autonomous driving where reinforcement learning struggles to simultaneously satisfy safety constraints and achieve high driving efficiency, this paper proposes a safety-oriented Harmonic Policy Iteration (HPI) framework. Methodologically, it introduces: (1) a novel harmonic gradient mechanism that dynamically fuses safety and efficiency gradients to enable conflict-minimizing policy updates; (2) the first integration of Distributed Soft Actor-Critic (DSAC) with HPI, yielding the end-to-end algorithm DSAC-H; and (3) comprehensive evaluation in a high-fidelity multi-lane simulation environment. Results demonstrate near-zero constraint violation rates, significantly improved training stability, and faster convergence compared to baseline methods including SAC and TD3. This work establishes a scalable, highly robust paradigm for constrained reinforcement learning in autonomous driving.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL), known for its self-evolution capability, offers a promising approach to training high-level autonomous driving systems. However, handling constraints remains a significant challenge for existing RL algorithms, particularly in real-world applications. In this paper, we propose a new safety-oriented training technique called harmonic policy iteration (HPI). At each RL iteration, it first calculates two policy gradients associated with efficient driving and safety constraints, respectively. Then, a harmonic gradient is derived for policy updating, minimizing conflicts between the two gradients and consequently enabling a more balanced and stable training process. Furthermore, we adopt the state-of-the-art DSAC algorithm as the backbone and integrate it with our HPI to develop a new safe RL algorithm, DSAC-H. Extensive simulations in multi-lane scenarios demonstrate that DSAC-H achieves efficient driving performance with near-zero safety constraint violations.

Problem

Research questions and friction points this paper is trying to address.

Handling constraints in RL for autonomous driving

Balancing efficiency and safety in multi-lane scenarios

Minimizing conflicts between policy gradients for stable training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Harmonic policy iteration balances safety and efficiency

DSAC algorithm integrated with harmonic gradient

Minimizes conflicts between driving and safety gradients

🔎 Similar Papers

No similar papers found.