PerlAD: Towards Enhanced Closed-loop End-to-end Autonomous Driving with Pseudo-simulation-based Reinforcement Learning

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance limitations of end-to-end autonomous driving policies arising from the mismatch between open-loop training and closed-loop execution, as well as the rendering gap and high computational cost associated with rendering-based reinforcement learning. To overcome these challenges, the authors propose PerlAD, a vector-space-based pseudo-simulation reinforcement learning framework that leverages offline data to construct an efficient, rendering-free trial-and-error environment. The key innovations include a conditional reactive world model that bridges the gap between static datasets and dynamic interaction, and a hierarchical decoupled planner that integrates imitation and reinforcement learning. Evaluated on Bench2Drive, PerlAD achieves a 10.29% improvement in Driving Score over the previous state-of-the-art end-to-end RL method and demonstrates high reliability in DOS occlusion scenarios—all without requiring costly online interaction.

Technology Category

Application Category

📝 Abstract
End-to-end autonomous driving policies based on Imitation Learning (IL) often struggle in closed-loop execution due to the misalignment between inadequate open-loop training objectives and real driving requirements. While Reinforcement Learning (RL) offers a solution by directly optimizing driving goals via reward signals, the rendering-based training environments introduce the rendering gap and are inefficient due to high computational costs. To overcome these challenges, we present a novel Pseudo-simulation-based RL method for closed-loop end-to-end autonomous driving, PerlAD. Based on offline datasets, PerlAD constructs a pseudo-simulation that operates in vector space, enabling efficient, rendering-free trial-and-error training. To bridge the gap between static datasets and dynamic closed-loop environments, PerlAD introduces a prediction world model that generates reactive agent trajectories conditioned on the ego vehicle's plan. Furthermore, to facilitate efficient planning, PerlAD utilizes a hierarchical decoupled planner that combines IL for lateral path generation and RL for longitudinal speed optimization. Comprehensive experimental results demonstrate that PerlAD achieves state-of-the-art performance on the Bench2Drive benchmark, surpassing the previous E2E RL method by 10.29% in Driving Score without requiring expensive online interactions. Additional evaluations on the DOS benchmark further confirm its reliability in handling safety-critical occlusion scenarios.
Problem

Research questions and friction points this paper is trying to address.

end-to-end autonomous driving
closed-loop execution
imitation learning
reinforcement learning
simulation gap
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pseudo-simulation
Reinforcement Learning
End-to-end Autonomous Driving
World Model
Hierarchical Planning
🔎 Similar Papers
No similar papers found.
Y
Yinfeng Gao
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China; also with Xiaomi EV and the State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Qichao Zhang
Qichao Zhang
中国科学院自动化研究所
人工智能 强化学习 博弈论 自适应动态规划
D
Deqing Liu
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; and School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
Z
Zhongpu Xia
State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China; and School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China
Guang Li
Guang Li
Assistant Professor, Hokkaido University
Dataset DistillationSelf-Supervised LearningData-Centric AIMedical Image Analysis
Kun Ma
Kun Ma
University of Jinan
Model-driven EngineeringBig Data ManagementData Intensive Computing
G
Guang Chen
Xiaomi EV
H
Hangjun Ye
Xiaomi EV
L
Long Chen
Xiaomi EV
D
Da-Wei Ding
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
Dongbin Zhao
Dongbin Zhao
Institute of Automation, Chinese Academy of Sciences
Deep Reinforcement LearningAdaptive Dynamic ProgrammingGame AISmart drivingrobotics