PlannerRFT: Reinforcing Diffusion Planners through Closed-Loop and Sample-Efficient Fine-Tuning

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of existing diffusion-based planners in leveraging reward signals during reinforcement fine-tuning, which often results in limited trajectory diversity and poor adaptability across scenarios. To overcome this, we propose PlannerRFT, a framework featuring a dual-branch optimization mechanism that refines the trajectory distribution and adaptively guides the denoising process—without altering the original inference pipeline—enabling efficient closed-loop reinforcement fine-tuning. Integrated with our custom high-speed simulation platform, nuMax, the approach supports large-scale parallel training. Evaluated on autonomous driving trajectory planning tasks, PlannerRFT significantly enhances performance by learning diverse yet realistic driving behaviors, while achieving a tenfold improvement in simulation efficiency.

Technology Category

Application Category

📝 Abstract
Diffusion-based planners have emerged as a promising approach for human-like trajectory generation in autonomous driving. Recent works incorporate reinforcement fine-tuning to enhance the robustness of diffusion planners through reward-oriented optimization in a generation-evaluation loop. However, they struggle to generate multi-modal, scenario-adaptive trajectories, hindering the exploitation efficiency of informative rewards during fine-tuning. To resolve this, we propose PlannerRFT, a sample-efficient reinforcement fine-tuning framework for diffusion-based planners. PlannerRFT adopts a dual-branch optimization that simultaneously refines the trajectory distribution and adaptively guides the denoising process toward more promising exploration, without altering the original inference pipeline. To support parallel learning at scale, we develop nuMax, an optimized simulator that achieves 10 times faster rollout compared to native nuPlan. Extensive experiments shows that PlannerRFT yields state-of-the-art performance with distinct behaviors emerging during the learning process.
Problem

Research questions and friction points this paper is trying to address.

diffusion-based planners
reinforcement fine-tuning
multi-modal trajectories
scenario-adaptive planning
sample efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion-based planning
reinforcement fine-tuning
sample-efficient learning
dual-branch optimization
autonomous driving simulation
🔎 Similar Papers
2024-10-07arXiv.orgCitations: 9
H
Hongchen Li
Tongji University; Shanghai Innovation Institute; OpenDriveLab at The University of Hong Kong
Tianyu Li
Tianyu Li
Fudan University | OpenDriveLab
motion planningautonomous drivingcomputer vision
Jiazhi Yang
Jiazhi Yang
CUHK, MMLab
World ModelsAutonomous DrivingRobot Learning
Haochen Tian
Haochen Tian
Institute of Automation, Chinese Academy of Sciences
RoboticMultimodalityComputer Vision
C
Caojun Wang
Tongji University; Shanghai Innovation Institute; OpenDriveLab at The University of Hong Kong
L
Lei Shi
Meituan
M
Mingyang Shang
Li Auto Inc.
Z
Zengrong Lin
Li Auto Inc.
G
Gaoqiang Wu
Li Auto Inc.
Z
Zhihui Hao
Li Auto Inc.
X
Xianpeng Lang
Li Auto Inc.
Jia Hu
Jia Hu
University of Exeter
edge-cloud computingresource optimizationsmart citynetwork securityapplied machine learning
Hongyang Li
Hongyang Li
Assistant Professor, University of Hong Kong
Computer VisionAutonomous DrivingRobotics