WorldRFT: Latent World Model Planning with Reinforcement Fine-Tuning for Autonomous Driving

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Existing latent world models suffer from coupled perception reconstruction and planning tasks, limiting planning performance. This paper proposes a planning-oriented latent world modeling paradigm: (1) hierarchical planning decomposition decouples representation learning from decision-making; (2) a local-perception interactive iterative optimization mechanism enhances policy robustness; and (3) we introduce Group Relative Policy Optimization (GRPO), the first algorithm enabling trajectory Gaussianization modeling and collision-aware reward-driven reinforcement fine-tuning. The method integrates vision-geometry foundation models with latent-space temporal self-supervised modeling. On nuScenes open-loop evaluation, collision rate drops by 83% (from 0.30% to 0.05%). In NavSim closed-loop testing—using monocular camera input only—our approach achieves 87.8 PDMS, matching the LiDAR-based SOTA method DiffusionDrive (88.1).

Technology Category

Application Category

📝 Abstract

Latent World Models enhance scene representation through temporal self-supervised learning, presenting a perception annotation-free paradigm for end-to-end autonomous driving. However, the reconstruction-oriented representation learning tangles perception with planning tasks, leading to suboptimal optimization for planning. To address this challenge, we propose WorldRFT, a planning-oriented latent world model framework that aligns scene representation learning with planning via a hierarchical planning decomposition and local-aware interactive refinement mechanism, augmented by reinforcement learning fine-tuning (RFT) to enhance safety-critical policy performance. Specifically, WorldRFT integrates a vision-geometry foundation model to improve 3D spatial awareness, employs hierarchical planning task decomposition to guide representation optimization, and utilizes local-aware iterative refinement to derive a planning-oriented driving policy. Furthermore, we introduce Group Relative Policy Optimization (GRPO), which applies trajectory Gaussianization and collision-aware rewards to fine-tune the driving policy, yielding systematic improvements in safety. WorldRFT achieves state-of-the-art (SOTA) performance on both open-loop nuScenes and closed-loop NavSim benchmarks. On nuScenes, it reduces collision rates by 83% (0.30% -> 0.05%). On NavSim, using camera-only sensors input, it attains competitive performance with the LiDAR-based SOTA method DiffusionDrive (87.8 vs. 88.1 PDMS).

Problem

Research questions and friction points this paper is trying to address.

Improves autonomous driving planning by disentangling perception from planning tasks

Enhances safety-critical policy performance through reinforcement learning fine-tuning

Addresses suboptimal optimization in reconstruction-oriented latent world models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Planning-oriented latent world model with reinforcement fine-tuning

Hierarchical planning decomposition and local-aware interactive refinement

Group Relative Policy Optimization with trajectory Gaussianization and collision-aware rewards

🔎 Similar Papers

No similar papers found.