TrajDiff: End-to-end Autonomous Driving without Perception Annotation

📅 2025-11-29

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

To address the bottleneck of reliance on costly manual perception annotations in end-to-end autonomous driving, this paper proposes TrajDiff—the first fully annotation-free generative driving framework. Methodologically, it introduces a trajectory-oriented BEV diffusion paradigm, comprising an unsupervised TrajBEV encoder and a BEV Diffusion Transformer (TB-DiT). Instead of using perception labels or handcrafted motion priors, TrajDiff directly models the multimodal distribution of future ego-vehicle trajectories conditioned solely on raw sensor inputs and ego-state, representing driving modes via Gaussian BEV heatmaps. Crucially, it eliminates explicit perception modules and engineered motion assumptions. Evaluated on NAVSIM, TrajDiff achieves 87.5 PDMS, improving to 88.5 PDMS with data augmentation—surpassing all prior annotation-free methods and matching state-of-the-art perception-dependent approaches.

Technology Category

Application Category

📝 Abstract

End-to-end autonomous driving systems directly generate driving policies from raw sensor inputs. While these systems can extract effective environmental features for planning, relying on auxiliary perception tasks, developing perception annotation-free planning paradigms has become increasingly critical due to the high cost of manual perception annotation. In this work, we propose TrajDiff, a Trajectory-oriented BEV Conditioned Diffusion framework that establishes a fully perception annotation-free generative method for end-to-end autonomous driving. TrajDiff requires only raw sensor inputs and future trajectory, constructing Gaussian BEV heatmap targets that inherently capture driving modalities. We design a simple yet effective trajectory-oriented BEV encoder to extract the TrajBEV feature without perceptual supervision. Furthermore, we introduce Trajectory-oriented BEV Diffusion Transformer (TB-DiT), which leverages ego-state information and the predicted TrajBEV features to directly generate diverse yet plausible trajectories, eliminating the need for handcrafted motion priors. Beyond architectural innovations, TrajDiff enables exploration of data scaling benefits in the annotation-free setting. Evaluated on the NAVSIM benchmark, TrajDiff achieves 87.5 PDMS, establishing state-of-the-art performance among all annotation-free methods. With data scaling, it further improves to 88.5 PDMS, which is comparable to advanced perception-based approaches. Our code and model will be made publicly available.

Problem

Research questions and friction points this paper is trying to address.

Develops perception annotation-free end-to-end autonomous driving

Generates diverse plausible trajectories without handcrafted motion priors

Explores data scaling benefits in annotation-free autonomous driving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Trajectory-oriented BEV encoder without perceptual supervision

Trajectory-oriented BEV Diffusion Transformer for diverse trajectory generation

Fully perception annotation-free generative method for autonomous driving

🔎 Similar Papers

No similar papers found.