Bootstrapping Reinforcement Learning with Sub-optimal Policies for Autonomous Driving

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Reinforcement learning (RL) for autonomous driving suffers from low sample efficiency, poor exploration capability, and heavy reliance on high-quality expert demonstrations. To address these challenges, this paper proposes a guided training framework leveraging suboptimal demonstration policies. Specifically, a rule-based, suboptimal lane-changing controller serves as a behavioral prior, integrated with the Soft Actor-Critic (SAC) algorithm to form a hybrid policy learning architecture. This integration enhances exploratory behavior and accelerates convergence without requiring expert-level demonstration data. Experimental results demonstrate that the proposed method significantly improves training efficiency—reducing convergence steps by approximately 35%—and boosts driving performance, increasing average task success rate by 22%. Moreover, the approach exhibits promising generalization across diverse driving scenarios. Overall, it establishes a novel paradigm for low-cost, robust RL-based autonomous driving control, mitigating dependence on costly expert supervision while maintaining safety-critical performance.

Technology Category

Application Category

📝 Abstract

Automated vehicle control using reinforcement learning (RL) has attracted significant attention due to its potential to learn driving policies through environment interaction. However, RL agents often face training challenges in sample efficiency and effective exploration, making it difficult to discover an optimal driving strategy. To address these issues, we propose guiding the RL driving agent with a demonstration policy that need not be a highly optimized or expert-level controller. Specifically, we integrate a rule-based lane change controller with the Soft Actor Critic (SAC) algorithm to enhance exploration and learning efficiency. Our approach demonstrates improved driving performance and can be extended to other driving scenarios that can similarly benefit from demonstration-based guidance.

Problem

Research questions and friction points this paper is trying to address.

Improving sample efficiency in RL training

Enhancing exploration for autonomous driving policies

Integrating sub-optimal controllers with SAC algorithm

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrating rule-based lane change controller with SAC

Guiding RL agent with sub-optimal demonstration policy

Enhancing exploration and learning efficiency through demonstrations

🔎 Similar Papers

Autonomous Algorithm for Training Autonomous Vehicles with Minimal Human Intervention