ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address key bottlenecks of multimodal large language models (MLLMs) in closed-loop end-to-end autonomous driving—including poor generalization, opaque decision-making, and misalignment with motion planning—this paper proposes a unified framework integrating scene prediction and decision reasoning. Methodologically, it introduces a novel dual-path reasoning mechanism that jointly leverages self-supervised next-scene prediction and supervised chain-of-thought (CoT) decision reasoning; constructs PDR, the first planning-oriented decision reasoning dataset (210k samples); and conducts MLLM fine-tuning and knowledge distillation. The core contribution lies in aligning visual representations with executable driving semantics, enabling causally interpretable decisions and strong zero-shot generalization. Experiments demonstrate a 19% reduction in L2 trajectory error and a +16.1-point improvement in driving score on Bench2Drive. Moreover, zero-shot transfer to the DOS benchmark achieves state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
Due to the powerful vision-language reasoning and generalization abilities, multimodal large language models (MLLMs) have garnered significant attention in the field of end-to-end (E2E) autonomous driving. However, their application to closed-loop systems remains underexplored, and current MLLM-based methods have not shown clear superiority to mainstream E2E imitation learning approaches. In this work, we propose ReasonPlan, a novel MLLM fine-tuning framework designed for closed-loop driving through holistic reasoning with a self-supervised Next Scene Prediction task and supervised Decision Chain-of-Thought process. This dual mechanism encourages the model to align visual representations with actionable driving context, while promoting interpretable and causally grounded decision making. We curate a planning-oriented decision reasoning dataset, namely PDR, comprising 210k diverse and high-quality samples. Our method outperforms the mainstream E2E imitation learning method by a large margin of 19% L2 and 16.1 driving score on Bench2Drive benchmark. Furthermore, ReasonPlan demonstrates strong zero-shot generalization on unseen DOS benchmark, highlighting its adaptability in handling zero-shot corner cases. Code and dataset will be found in https://github.com/Liuxueyi/ReasonPlan.
Problem

Research questions and friction points this paper is trying to address.

Enhancing closed-loop autonomous driving with MLLMs
Improving interpretable decision-making via scene prediction
Addressing zero-shot generalization in driving scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

MLLM fine-tuning for closed-loop driving
Self-supervised Next Scene Prediction task
Supervised Decision Chain-of-Thought process
🔎 Similar Papers
No similar papers found.
Xueyi Liu
Xueyi Liu
Institute of Automation, Chinese Academy of Sciences
Z
Zuodong Zhong
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
Y
Yuxin Guo
SKL-MAIS, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Y
Yun-Fu Liu
EACON, Fujian, China
Z
Zhiguo Su
EACON, Fujian, China
Qichao Zhang
Qichao Zhang
中国科学院自动化研究所
人工智能 强化学习 博弈论 自适应动态规划
Junli Wang
Junli Wang
Tsinghua University
Natural Language Processing
Y
Yinfeng Gao
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing, China
Yupeng Zheng
Yupeng Zheng
Institute of Automation, Chinese Academy of Sciences
Qiao Lin
Qiao Lin
EACON, Fujian, China
H
Huiyong Chen
EACON, Fujian, China
Dongbin Zhao
Dongbin Zhao
Institute of Automation, Chinese Academy of Sciences
Deep Reinforcement LearningAdaptive Dynamic ProgrammingGame AISmart drivingrobotics