Bootstrapping Reinforcement Learning with Sub-optimal Policies for Autonomous Driving

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reinforcement learning (RL) for autonomous driving suffers from low sample efficiency, poor exploration capability, and heavy reliance on high-quality expert demonstrations. To address these challenges, this paper proposes a guided training framework leveraging suboptimal demonstration policies. Specifically, a rule-based, suboptimal lane-changing controller serves as a behavioral prior, integrated with the Soft Actor-Critic (SAC) algorithm to form a hybrid policy learning architecture. This integration enhances exploratory behavior and accelerates convergence without requiring expert-level demonstration data. Experimental results demonstrate that the proposed method significantly improves training efficiency—reducing convergence steps by approximately 35%—and boosts driving performance, increasing average task success rate by 22%. Moreover, the approach exhibits promising generalization across diverse driving scenarios. Overall, it establishes a novel paradigm for low-cost, robust RL-based autonomous driving control, mitigating dependence on costly expert supervision while maintaining safety-critical performance.

Technology Category

Application Category

📝 Abstract
Automated vehicle control using reinforcement learning (RL) has attracted significant attention due to its potential to learn driving policies through environment interaction. However, RL agents often face training challenges in sample efficiency and effective exploration, making it difficult to discover an optimal driving strategy. To address these issues, we propose guiding the RL driving agent with a demonstration policy that need not be a highly optimized or expert-level controller. Specifically, we integrate a rule-based lane change controller with the Soft Actor Critic (SAC) algorithm to enhance exploration and learning efficiency. Our approach demonstrates improved driving performance and can be extended to other driving scenarios that can similarly benefit from demonstration-based guidance.
Problem

Research questions and friction points this paper is trying to address.

Improving sample efficiency in RL training
Enhancing exploration for autonomous driving policies
Integrating sub-optimal controllers with SAC algorithm
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrating rule-based lane change controller with SAC
Guiding RL agent with sub-optimal demonstration policy
Enhancing exploration and learning efficiency through demonstrations
🔎 Similar Papers
No similar papers found.
Z
Zhihao Zhang
Electrical and Computer Engineering, The Ohio State University
Chengyang Peng
Chengyang Peng
The Ohio State University
Robotics
Ekim Yurtsever
Ekim Yurtsever
The Ohio State University
Machine LearningComputer VisionAutomated Driving Systems
K
Keith A. Redmill
Department of Electrical and Computer Engineering, The Ohio State University