🤖 AI Summary
To address two key limitations in end-to-end autonomous driving trajectory planning—(1) the lack of scene-adaptive trajectory priors and (2) the absence of policy-driven optimization in trajectory evaluation—this paper proposes a Scene-Adaptive Mixture-of-Experts (MoE) framework. First, a multimodal perception backbone identifies driving scenes and dynamically routes inputs to specialized expert subnetworks, enabling scene-specific trajectory prior modeling. Second, reinforcement learning is integrated to optimize the trajectory scoring policy, overcoming the constraints of single-stage supervised training. The method jointly leverages MoE architecture, multi-source sensor fusion, and policy-driven evaluation. Evaluated on the nuScenes-NavSim benchmark (ICCV), it achieves 51.08 points, ranking third and demonstrating significantly improved planning robustness and safety in complex traffic scenarios.
📝 Abstract
Current autonomous driving systems often favor end-to-end frameworks, which take sensor inputs like images and learn to map them into trajectory space via neural networks. Previous work has demonstrated that models can achieve better planning performance when provided with a prior distribution of possible trajectories. However, these approaches often overlook two critical aspects: 1) The appropriate trajectory prior can vary significantly across different driving scenarios. 2) Their trajectory evaluation mechanism lacks policy-driven refinement, remaining constrained by the limitations of one-stage supervised training. To address these issues, we explore improvements in two key areas. For problem 1, we employ MoE to apply different trajectory priors tailored to different scenarios. For problem 2, we utilize Reinforcement Learning to fine-tune the trajectory scoring mechanism. Additionally, we integrate models with different perception backbones to enhance perceptual features. Our integrated model achieved a score of 51.08 on the navsim ICCV benchmark, securing third place.