WOD-E2E: Waymo Open Dataset for End-to-End Driving in Challenging Long-tail Scenarios

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

253K/year

🤖 AI Summary

Existing end-to-end driving research lacks benchmark datasets targeting challenging long-tail scenarios, and conventional open-loop evaluation metrics fail to quantify model decision quality under rare conditions. Method: We introduce the first end-to-end autonomous driving dataset specifically designed for complex long-tail scenarios—comprising 4,021 rare-sequence clips—featuring high-precision routing, ego-vehicle state, and synchronized 360° multi-camera video. We propose the first long-tail–specific evaluation benchmark and a novel open-loop metric, the Rater Feedback Score (RFS), grounded in human trajectory preferences. Our model employs an 8-camera surround-view input, high-dimensional perception fusion, and multimodal trajectory prediction, trained with high-quality annotations derived from Waymo real-world data. Contribution/Results: We release a 12-hour public dataset with preference-labeled trajectories, supporting the WOD-E2E 2025 Challenge. Our framework significantly improves model robustness and evaluability in extreme scenarios.

Technology Category

Application Category

📝 Abstract

Vision-based end-to-end (E2E) driving has garnered significant interest in the research community due to its scalability and synergy with multimodal large language models (MLLMs). However, current E2E driving benchmarks primarily feature nominal scenarios, failing to adequately test the true potential of these systems. Furthermore, existing open-loop evaluation metrics often fall short in capturing the multi-modal nature of driving or effectively evaluating performance in long-tail scenarios. To address these gaps, we introduce the Waymo Open Dataset for End-to-End Driving (WOD-E2E). WOD-E2E contains 4,021 driving segments (approximately 12 hours), specifically curated for challenging long-tail scenarios that that are rare in daily life with an occurring frequency of less than 0.03%. Concretely, each segment in WOD-E2E includes the high-level routing information, ego states, and 360-degree camera views from 8 surrounding cameras. To evaluate the E2E driving performance on these long-tail situations, we propose a novel open-loop evaluation metric: Rater Feedback Score (RFS). Unlike conventional metrics that measure the distance between predicted way points and the logs, RFS measures how closely the predicted trajectory matches rater-annotated trajectory preference labels. We have released rater preference labels for all WOD-E2E validation set segments, while the held out test set labels have been used for the 2025 WOD-E2E Challenge. Through our work, we aim to foster state of the art research into generalizable, robust, and safe end-to-end autonomous driving agents capable of handling complex real-world situations.

Problem

Research questions and friction points this paper is trying to address.

Addressing lack of challenging long-tail scenarios in E2E driving benchmarks

Improving evaluation metrics for multimodal driving performance assessment

Enabling robust autonomous driving in rare real-world situations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset for end-to-end driving in rare scenarios

Uses rater feedback score as evaluation metric

Includes high-level routing and 360-degree camera views

🔎 Similar Papers

No similar papers found.