🤖 AI Summary
To address inefficient training and policy instability in reinforcement learning (RL)-based trajectory planning for large-scale real-world driving scenarios, this paper proposes the Consistent Autoregressive Planning (CARP) framework. CARP integrates autoregressive modeling with a generate-then-select mechanism, incorporates an invariant-view module to enhance representation robustness, and introduces an expert-guided reward function for high-quality policy optimization. For the first time, CARP achieves comprehensive superiority over both imitation learning (IL) and rule-based state-of-the-art methods on the nuPlan benchmark, significantly improving planning performance, training efficiency, and policy stability—particularly in multimodal and temporally coherent trajectory generation. The core innovation lies in the synergistic design of a consistency-enforced autoregressive architecture and invariance-augmented expert reward, establishing a scalable and highly reliable paradigm for RL-driven autonomous driving planning.
📝 Abstract
Trajectory planning is vital for autonomous driving, ensuring safe and efficient navigation in complex environments. While recent learning-based methods, particularly reinforcement learning (RL), have shown promise in specific scenarios, RL planners struggle with training inefficiencies and managing large-scale, real-world driving scenarios. In this paper, we introduce extbf{CarPlanner}, a extbf{C}onsistent extbf{a}uto- extbf{r}egressive extbf{Planner} that uses RL to generate multi-modal trajectories. The auto-regressive structure enables efficient large-scale RL training, while the incorporation of consistency ensures stable policy learning by maintaining coherent temporal consistency across time steps. Moreover, CarPlanner employs a generation-selection framework with an expert-guided reward function and an invariant-view module, simplifying RL training and enhancing policy performance. Extensive analysis demonstrates that our proposed RL framework effectively addresses the challenges of training efficiency and performance enhancement, positioning CarPlanner as a promising solution for trajectory planning in autonomous driving. To the best of our knowledge, we are the first to demonstrate that the RL-based planner can surpass both IL- and rule-based state-of-the-arts (SOTAs) on the challenging large-scale real-world dataset nuPlan. Our proposed CarPlanner surpasses RL-, IL-, and rule-based SOTA approaches within this demanding dataset.