FutureX: Enhance End-to-End Autonomous Driving via Latent Chain-of-Thought World Model

📅 2025-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient robustness of end-to-end planners in dynamic traffic—stemming from their neglect of ego-vehicle–environment interaction—this paper proposes a world model framework grounded in implicit Chain-of-Thought (CoT) reasoning. Our method introduces an Auto-think Switch mechanism that adaptively alternates between “thinking mode” (multi-step CoT rollouts in latent space) and “reactive mode” (single-step response), enabling both future scene evolution modeling and progressive trajectory optimization. We pioneer the integration of implicit CoT into end-to-end driving planning, synergizing lightweight scene summarization with a Latent World Model to balance reasoning depth and real-time performance. Evaluated on NAVSIM, our approach significantly improves TransFuser’s PDMS by +6.2 while reducing collision rates, all while maintaining millisecond-level inference latency—thus achieving a favorable trade-off between safety and computational efficiency.

Technology Category

Application Category

📝 Abstract
In autonomous driving, end-to-end planners learn scene representations from raw sensor data and utilize them to generate a motion plan or control actions. However, exclusive reliance on the current scene for motion planning may result in suboptimal responses in highly dynamic traffic environments where ego actions further alter the future scene. To model the evolution of future scenes, we leverage the World Model to represent how the ego vehicle and its environment interact and change over time, which entails complex reasoning. The Chain of Thought (CoT) offers a promising solution by forecasting a sequence of future thoughts that subsequently guide trajectory refinement. In this paper, we propose FutureX, a CoT-driven pipeline that enhances end-to-end planners to perform complex motion planning via future scene latent reasoning and trajectory refinement. Specifically, the Auto-think Switch examines the current scene and decides whether additional reasoning is required to yield a higher-quality motion plan. Once FutureX enters the Thinking mode, the Latent World Model conducts a CoT-guided rollout to predict future scene representation, enabling the Summarizer Module to further refine the motion plan. Otherwise, FutureX operates in an Instant mode to generate motion plans in a forward pass for relatively simple scenes. Extensive experiments demonstrate that FutureX enhances existing methods by producing more rational motion plans and fewer collisions without compromising efficiency, thereby achieving substantial overall performance gains, e.g., 6.2 PDMS improvement for TransFuser on NAVSIM. Code will be released.
Problem

Research questions and friction points this paper is trying to address.

Enhance autonomous driving planners via future scene reasoning
Model ego-environment interaction with Chain-of-Thought world models
Improve motion planning quality and reduce collisions dynamically
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent World Model predicts future scene representations
Auto-think Switch decides when to use reasoning
Chain-of-Thought guides trajectory refinement for planning
🔎 Similar Papers
No similar papers found.
H
Hongbin Lin
FNii-Shenzhen, SSE, CUHK-Shenzhen
Y
Yiming Yang
FNii-Shenzhen, SSE, CUHK-Shenzhen
Y
Yifan Zhang
MiroMind AI
C
Chaoda Zheng
Xpeng Motors
J
Jie Feng
Xidian University
S
Sheng Wang
Xpeng Motors
Zhennan Wang
Zhennan Wang
Peng Cheng Lab
neural network designdeep learningcomputer vision
S
Shijia Chen
Xpeng Motors
B
Boyang Wang
Xpeng Motors
Y
Yu Zhang
Xpeng Motors
X
Xianming Liu
Xpeng Motors
Shuguang Cui
Shuguang Cui
Distinguished Presidential Chair Professor, School of Science and Engineering, CUHKSZ
AI+NetworkingWireless Communications
Z
Zhen Li
SSE, CUHK-Shenzhen, FNii-Shenzhen