🤖 AI Summary
This work addresses the performance limitations of end-to-end autonomous driving policies arising from the mismatch between open-loop training and closed-loop execution, as well as the rendering gap and high computational cost associated with rendering-based reinforcement learning. To overcome these challenges, the authors propose PerlAD, a vector-space-based pseudo-simulation reinforcement learning framework that leverages offline data to construct an efficient, rendering-free trial-and-error environment. The key innovations include a conditional reactive world model that bridges the gap between static datasets and dynamic interaction, and a hierarchical decoupled planner that integrates imitation and reinforcement learning. Evaluated on Bench2Drive, PerlAD achieves a 10.29% improvement in Driving Score over the previous state-of-the-art end-to-end RL method and demonstrates high reliability in DOS occlusion scenarios—all without requiring costly online interaction.
📝 Abstract
End-to-end autonomous driving policies based on Imitation Learning (IL) often struggle in closed-loop execution due to the misalignment between inadequate open-loop training objectives and real driving requirements. While Reinforcement Learning (RL) offers a solution by directly optimizing driving goals via reward signals, the rendering-based training environments introduce the rendering gap and are inefficient due to high computational costs. To overcome these challenges, we present a novel Pseudo-simulation-based RL method for closed-loop end-to-end autonomous driving, PerlAD. Based on offline datasets, PerlAD constructs a pseudo-simulation that operates in vector space, enabling efficient, rendering-free trial-and-error training. To bridge the gap between static datasets and dynamic closed-loop environments, PerlAD introduces a prediction world model that generates reactive agent trajectories conditioned on the ego vehicle's plan. Furthermore, to facilitate efficient planning, PerlAD utilizes a hierarchical decoupled planner that combines IL for lateral path generation and RL for longitudinal speed optimization. Comprehensive experimental results demonstrate that PerlAD achieves state-of-the-art performance on the Bench2Drive benchmark, surpassing the previous E2E RL method by 10.29% in Driving Score without requiring expensive online interactions. Additional evaluations on the DOS benchmark further confirm its reliability in handling safety-critical occlusion scenarios.