🤖 AI Summary
To address safety risks—such as speeding and collisions—in autonomous driving trajectory planning arising from overreliance on expert demonstrations, this paper proposes a novel safety-aligned language modeling paradigm. Methodologically, we first discretize trajectory sequences into motion tokens and construct an autoregressive trajectory predictor; then, we incorporate explicit rule-based rewards (e.g., collision avoidance and speed-limit compliance) and apply Group Relative Policy Optimization (GRPO) for reinforcement fine-tuning, thereby eliminating implicit dependence on human driving behavior. Our key contribution is the first formalization of trajectory planning as a safety-constrained language modeling task, coupled with a rule-guided GRPO fine-tuning framework. Evaluated on the nuPlan benchmark, our approach reduces collision rate by 42% and traffic-rule violations by 37%, achieving state-of-the-art performance.
📝 Abstract
Safe and feasible trajectory planning is essential for real-world autonomous driving systems. However, existing learning-based planning methods often rely on expert demonstrations, which not only lack explicit safety awareness but also risk inheriting unsafe behaviors such as speeding from suboptimal human driving data. Inspired by the success of large language models, we propose Plan-R1, a novel two-stage trajectory planning framework that formulates trajectory planning as a sequential prediction task, guided by explicit planning principles such as safety, comfort, and traffic rule compliance. In the first stage, we train an autoregressive trajectory predictor via next motion token prediction on expert data. In the second stage, we design rule-based rewards (e.g., collision avoidance, speed limits) and fine-tune the model using Group Relative Policy Optimization (GRPO), a reinforcement learning strategy, to align its predictions with these planning principles. Experiments on the nuPlan benchmark demonstrate that our Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance.