WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving

📅 2025-12-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing latent world models suffer from coupled perception reconstruction and planning tasks, limiting planning performance. This paper proposes a planning-oriented latent world modeling paradigm: (1) hierarchical planning decomposition decouples representation learning from decision-making; (2) a local-perception interactive iterative optimization mechanism enhances policy robustness; and (3) we introduce Group Relative Policy Optimization (GRPO), the first algorithm enabling trajectory Gaussianization modeling and collision-aware reward-driven reinforcement fine-tuning. The method integrates vision-geometry foundation models with latent-space temporal self-supervised modeling. On nuScenes open-loop evaluation, collision rate drops by 83% (from 0.30% to 0.05%). In NavSim closed-loop testing—using monocular camera input only—our approach achieves 87.8 PDMS, matching the LiDAR-based SOTA method DiffusionDrive (88.1).

Technology Category

Application Category

📝 Abstract
Latent World Models enhance scene representation through temporal self-supervised learning, presenting a perception annotation-free paradigm for end-to-end autonomous driving. However, the reconstruction-oriented representation learning tangles perception with planning tasks, leading to suboptimal optimization for planning. To address this challenge, we propose WorldRFT, a planning-oriented latent world model framework that aligns scene representation learning with planning via a hierarchical planning decomposition and local-aware interactive refinement mechanism, augmented by reinforcement learning fine-tuning (RFT) to enhance safety-critical policy performance. Specifically, WorldRFT integrates a vision-geometry foundation model to improve 3D spatial awareness, employs hierarchical planning task decomposition to guide representation optimization, and utilizes local-aware iterative refinement to derive a planning-oriented driving policy. Furthermore, we introduce Group Relative Policy Optimization (GRPO), which applies trajectory Gaussianization and collision-aware rewards to fine-tune the driving policy, yielding systematic improvements in safety. WorldRFT achieves state-of-the-art (SOTA) performance on both open-loop nuScenes and closed-loop NavSim benchmarks. On nuScenes, it reduces collision rates by 83% (0.30% -> 0.05%). On NavSim, using camera-only sensors input, it attains competitive performance with the LiDAR-based SOTA method DiffusionDrive (87.8 vs. 88.1 PDMS).
Problem

Research questions and friction points this paper is trying to address.

Improves autonomous driving planning by disentangling perception from planning tasks
Enhances safety-critical policy performance through reinforcement learning fine-tuning
Addresses suboptimal optimization in reconstruction-oriented latent world models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Planning-oriented latent world model with reinforcement fine-tuning
Hierarchical planning decomposition and local-aware interactive refinement
Group Relative Policy Optimization with trajectory Gaussianization and collision-aware rewards
🔎 Similar Papers
No similar papers found.
P
Pengxuan Yang
The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS
B
Ben Lu
Li Auto
Z
Zhongpu Xia
The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS
Chao Han
Chao Han
PhD, Electrical Engineering, California Institute of Technology
Optical imaging systemsMEMSBiomedical microdevices
Y
Yinfeng Gao
The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, CAS
T
Teng Zhang
Li Auto
K
Kun Zhan
Li Auto
X
XianPeng Lang
Li Auto
Yupeng Zheng
Yupeng Zheng
Institute of Automation, Chinese Academy of Sciences
Qichao Zhang
Qichao Zhang
中国科学院自动化研究所
人工智能 强化学习 博弈论 自适应动态规划