TransDiffuser: End-to-end Trajectory Generation with Decorrelated Multi-modal Representation for Autonomous Driving

📅 2025-05-14

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the mode collapse problem and the inherent trade-off between trajectory diversity and accuracy in diffusion-based autonomous driving motion planning. We propose TransDiffuser, an end-to-end generative model that conditions on multimodal scene inputs—namely, camera images, LiDAR point clouds, and navigation commands—and directly outputs high-quality, diverse candidate trajectories via a Transformer encoder–diffusion decoder architecture. Our key contribution is the first multimodal representation disentanglement optimization mechanism: through disentanglement-regularized training, we enforce semantic separation in the latent feature space, enabling joint improvement of diversity and accuracy without relying on anchor trajectory priors. Evaluated on the NAVSIM benchmark, TransDiffuser achieves a Planning Diversity and Matching Score (PDMS) of 94.85—significantly outperforming all state-of-the-art anchor-free methods.

Technology Category

Application Category

📝 Abstract

In recent years, diffusion model has shown its potential across diverse domains from vision generation to language modeling. Transferring its capabilities to modern autonomous driving systems has also emerged as a promising direction.In this work, we propose TransDiffuser, an encoder-decoder based generative trajectory planning model for end-to-end autonomous driving. The encoded scene information serves as the multi-modal conditional input of the denoising decoder. To tackle the mode collapse dilemma in generating high-quality diverse trajectories, we introduce a simple yet effective multi-modal representation decorrelation optimization mechanism during the training process.TransDiffuser achieves PDMS of 94.85 on the NAVSIM benchmark, surpassing previous state-of-the-art methods without any anchor-based prior trajectories.

Problem

Research questions and friction points this paper is trying to address.

Generating diverse trajectories for autonomous driving

Overcoming mode collapse in trajectory generation

Enhancing multi-modal scene representation decorrelation

Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end generative trajectory planning model

Multi-modal representation decorrelation optimization

Denoising decoder with scene condition input

🔎 Similar Papers

GDTS: Goal-Guided Diffusion Model with Tree Sampling for Multi-Modal Pedestrian Trajectory Prediction