Generalized Trajectory Scoring for End-to-end Multimodal Planning

📅 2025-06-07

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Existing trajectory scorers exhibit poor generalization across static candidate sets and dynamically generated trajectories. To address this, we propose a robust trajectory scoring framework for end-to-end multimodal autonomous driving planning. Methodologically, it integrates three key innovations: (1) a diffusion-based trajectory generator that explicitly models the trajectory distribution; (2) a hyper-dense vocabulary coupled with vocabulary dropout to enable generalization across candidate sets of arbitrary size; and (3) multi-sensor-augmented fine-tuning to enhance perceptual robustness. This is the first scorer capable of cross-domain inference over trajectory candidate sets of arbitrary scale. Our method achieved first place in the Navsim v2 Challenge. Even under degraded sensor inputs, its performance approaches that of privileged methods relying on ground-truth perception, significantly improving trajectory selection accuracy and distributional adaptability.

Technology Category

Application Category

📝 Abstract

End-to-end multi-modal planning is a promising paradigm in autonomous driving, enabling decision-making with diverse trajectory candidates. A key component is a robust trajectory scorer capable of selecting the optimal trajectory from these candidates. While recent trajectory scorers focus on scoring either large sets of static trajectories or small sets of dynamically generated ones, both approaches face significant limitations in generalization. Static vocabularies provide effective coarse discretization but struggle to make fine-grained adaptation, while dynamic proposals offer detailed precision but fail to capture broader trajectory distributions. To overcome these challenges, we propose GTRS (Generalized Trajectory Scoring), a unified framework for end-to-end multi-modal planning that combines coarse and fine-grained trajectory evaluation. GTRS consists of three complementary innovations: (1) a diffusion-based trajectory generator that produces diverse fine-grained proposals; (2) a vocabulary generalization technique that trains a scorer on super-dense trajectory sets with dropout regularization, enabling its robust inference on smaller subsets; and (3) a sensor augmentation strategy that enhances out-of-domain generalization while incorporating refinement training for critical trajectory discrimination. As the winning solution of the Navsim v2 Challenge, GTRS demonstrates superior performance even with sub-optimal sensor inputs, approaching privileged methods that rely on ground-truth perception. Code will be available at https://github.com/NVlabs/GTRS.

Problem

Research questions and friction points this paper is trying to address.

Overcoming limitations in generalization for trajectory scoring in autonomous driving

Combining coarse and fine-grained trajectory evaluation in multimodal planning

Enhancing robustness with sub-optimal sensor inputs and out-of-domain generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based trajectory generator for diverse proposals

Vocabulary generalization with super-dense training

Sensor augmentation for out-of-domain generalization

🔎 Similar Papers

MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents