Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address safety risks—such as speeding and collisions—in autonomous driving trajectory planning arising from overreliance on expert demonstrations, this paper proposes a novel safety-aligned language modeling paradigm. Methodologically, we first discretize trajectory sequences into motion tokens and construct an autoregressive trajectory predictor; then, we incorporate explicit rule-based rewards (e.g., collision avoidance and speed-limit compliance) and apply Group Relative Policy Optimization (GRPO) for reinforcement fine-tuning, thereby eliminating implicit dependence on human driving behavior. Our key contribution is the first formalization of trajectory planning as a safety-constrained language modeling task, coupled with a rule-guided GRPO fine-tuning framework. Evaluated on the nuPlan benchmark, our approach reduces collision rate by 42% and traffic-rule violations by 37%, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
Safe and feasible trajectory planning is essential for real-world autonomous driving systems. However, existing learning-based planning methods often rely on expert demonstrations, which not only lack explicit safety awareness but also risk inheriting unsafe behaviors such as speeding from suboptimal human driving data. Inspired by the success of large language models, we propose Plan-R1, a novel two-stage trajectory planning framework that formulates trajectory planning as a sequential prediction task, guided by explicit planning principles such as safety, comfort, and traffic rule compliance. In the first stage, we train an autoregressive trajectory predictor via next motion token prediction on expert data. In the second stage, we design rule-based rewards (e.g., collision avoidance, speed limits) and fine-tune the model using Group Relative Policy Optimization (GRPO), a reinforcement learning strategy, to align its predictions with these planning principles. Experiments on the nuPlan benchmark demonstrate that our Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance.
Problem

Research questions and friction points this paper is trying to address.

Ensuring safe and feasible autonomous driving trajectory planning
Overcoming reliance on unsafe human driving demonstrations
Aligning trajectory predictions with safety and traffic rules
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage trajectory planning framework
Autoregressive trajectory predictor training
Rule-based rewards with GRPO fine-tuning
🔎 Similar Papers
No similar papers found.
X
Xiaolong Tang
Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Meina Kan
Meina Kan
Institute of Computing Technology, Chinese Academy of Sciences
Computer VisionPattern RecognitionFace Recognition
Shiguang Shan
Shiguang Shan
Professor of Institute of Computing Technology, Chinese Academy of Sciences
Computer VisionPattern RecognitionMachine LearningFace Recognition
X
Xilin Chen
Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences