Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address safety risks—such as speeding and collisions—in autonomous driving trajectory planning arising from overreliance on expert demonstrations, this paper proposes a novel safety-aligned language modeling paradigm. Methodologically, we first discretize trajectory sequences into motion tokens and construct an autoregressive trajectory predictor; then, we incorporate explicit rule-based rewards (e.g., collision avoidance and speed-limit compliance) and apply Group Relative Policy Optimization (GRPO) for reinforcement fine-tuning, thereby eliminating implicit dependence on human driving behavior. Our key contribution is the first formalization of trajectory planning as a safety-constrained language modeling task, coupled with a rule-guided GRPO fine-tuning framework. Evaluated on the nuPlan benchmark, our approach reduces collision rate by 42% and traffic-rule violations by 37%, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

Safe and feasible trajectory planning is essential for real-world autonomous driving systems. However, existing learning-based planning methods often rely on expert demonstrations, which not only lack explicit safety awareness but also risk inheriting unsafe behaviors such as speeding from suboptimal human driving data. Inspired by the success of large language models, we propose Plan-R1, a novel two-stage trajectory planning framework that formulates trajectory planning as a sequential prediction task, guided by explicit planning principles such as safety, comfort, and traffic rule compliance. In the first stage, we train an autoregressive trajectory predictor via next motion token prediction on expert data. In the second stage, we design rule-based rewards (e.g., collision avoidance, speed limits) and fine-tune the model using Group Relative Policy Optimization (GRPO), a reinforcement learning strategy, to align its predictions with these planning principles. Experiments on the nuPlan benchmark demonstrate that our Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance.

Problem

Research questions and friction points this paper is trying to address.

Ensuring safe and feasible autonomous driving trajectory planning

Overcoming reliance on unsafe human driving demonstrations

Aligning trajectory predictions with safety and traffic rules

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage trajectory planning framework

Autoregressive trajectory predictor training

Rule-based rewards with GRPO fine-tuning

🔎 Similar Papers

No similar papers found.