Drive-R1: Bridging Reasoning and Planning in VLMs for Autonomous Driving with Reinforcement Learning

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the decoupling of reasoning and planning in vision-language models (VLMs) for autonomous driving motion planning—specifically, overreliance on historical inputs and misalignment between chain-of-thought (CoT) reasoning and actual trajectory outputs—this paper proposes a compact, task-specific VLM trained end-to-end via joint optimization. Methodologically, we introduce a two-stage training framework: first, supervised fine-tuning using CoT-annotated data; second, reinforcement learning with dual reward signals—trajectory prediction accuracy and meta-action plausibility—to explicitly align reasoning paths with planning outputs. Our key contribution is the first explicit integration of CoT generation into the motion planning optimization objective, enabling co-training of reasoning strategies and control decisions. Evaluated on nuScenes and DriveLM-nuScenes benchmarks, our approach significantly outperforms existing VLM-based methods, demonstrating both effectiveness and generalizability of unified reasoning-planning modeling.

Technology Category

Application Category

📝 Abstract
Large vision-language models (VLMs) for autonomous driving (AD) are evolving beyond perception and cognition tasks toward motion planning. However, we identify two critical challenges in this direction: (1) VLMs tend to learn shortcuts by relying heavily on history input information, achieving seemingly strong planning results without genuinely understanding the visual inputs; and (2) the chain-ofthought (COT) reasoning processes are always misaligned with the motion planning outcomes, and how to effectively leverage the complex reasoning capability to enhance planning remains largely underexplored. In this paper, we start from a small-scale domain-specific VLM and propose Drive-R1 designed to bridges the scenario reasoning and motion planning for AD. Drive-R1 first undergoes the supervised finetuning on a elaborate dataset containing both long and short COT data. Drive-R1 is encouraged to reason step-by-step from visual input to final planning decisions. Subsequently, Drive-R1 is trained within a reinforcement learning framework that incentivizes the discovery of reasoning paths that are more informative for planning, guided by rewards based on predicted trajectories and meta actions. Experimental evaluations on the nuScenes and DriveLM-nuScenes benchmarks demonstrate that Drive-R1 achieves superior performance compared to existing state-of-the-art VLMs. We believe that Drive-R1 presents a promising direction for bridging reasoning and planning in AD, offering methodological insights for future research and applications.
Problem

Research questions and friction points this paper is trying to address.

VLMs rely on history inputs without understanding visuals
Misalignment between reasoning processes and motion planning outcomes
Bridging scenario reasoning and motion planning in autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning enhances reasoning-planning alignment
Supervised fine-tuning with diverse chain-of-thought data
Reward-driven trajectory and meta-action optimization
🔎 Similar Papers
No similar papers found.
Y
Yue Li
University of Science and Technology of China
M
Meng Tian
Huawei Noah’s Ark Lab
D
Dechang Zhu
Huawei Noah’s Ark Lab
Jiangtong Zhu
Jiangtong Zhu
XJTU
Z
Zhenyu Lin
Huawei Noah’s Ark Lab
Zhiwei Xiong
Zhiwei Xiong
University of Science and Technology of China
computational photographybiomedical image analysis
X
Xinhai Zhao
Huawei Noah’s Ark Lab