Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

End-to-end autonomous driving faces key challenges in RL-based approaches—namely, training instability, reliance on privileged information (e.g., ground-truth states), and causal confusion/distribution shift in imitation learning. To address these, we propose a dual-stream model-based RL (MBRL) framework. It comprises two complementary world models: a “perception world model” driven by raw sensor inputs, and a “supervision world model” conditioned on privileged information; cross-modal knowledge transfer is achieved via a consistency-guided alignment mechanism. Additionally, a neural planner is incorporated to enhance long-horizon decision-making. To our knowledge, this is the first fully end-to-end autonomous driving method that is (i) entirely grounded in raw sensor data, (ii) purely RL-driven, and (iii) completely free of ground-truth supervision. Our approach achieves state-of-the-art performance, ranking first on both CARLA Leaderboard 2.0 and Bench2Drive—the first RL-only method to appear on either leaderboard.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning (RL) can mitigate the causal confusion and distribution shift inherent to imitation learning (IL). However, applying RL to end-to-end autonomous driving (E2E-AD) remains an open problem for its training difficulty, and IL is still the mainstream paradigm in both academia and industry. Recently Model-based Reinforcement Learning (MBRL) have demonstrated promising results in neural planning; however, these methods typically require privileged information as input rather than raw sensor data. We fill this gap by designing Raw2Drive, a dual-stream MBRL approach. Initially, we efficiently train an auxiliary privileged world model paired with a neural planner that uses privileged information as input. Subsequently, we introduce a raw sensor world model trained via our proposed Guidance Mechanism, which ensures consistency between the raw sensor world model and the privileged world model during rollouts. Finally, the raw sensor world model combines the prior knowledge embedded in the heads of the privileged world model to effectively guide the training of the raw sensor policy. Raw2Drive is so far the only RL based end-to-end method on CARLA Leaderboard 2.0, and Bench2Drive and it achieves state-of-the-art performance.

Problem

Research questions and friction points this paper is trying to address.

RL overcomes IL issues in autonomous driving

MBRL lacks raw sensor data integration

Raw2Drive aligns world models for RL

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-stream MBRL for raw sensor input

Guidance Mechanism aligns world models

Combines privileged and raw sensor knowledge

🔎 Similar Papers

Enhancing End-to-End Autonomous Driving with Latent World Model