🤖 AI Summary
Existing end-to-end autonomous driving (E2EAD) systems lack explicit modeling of personalized driving styles, primarily due to the absence of large-scale, fine-grained, real-world driving preference datasets.
Method: This paper introduces the first driving-style-aware E2EAD benchmark, featuring the first large-scale, real-world dataset with comprehensive annotations integrating static environments and dynamic semantics. We propose a novel vision-language model framework that jointly performs scene understanding and behavior inference. To ensure annotation fidelity, we integrate road topology extraction, behavioral distribution analysis, rule-based heuristics, and human-in-the-loop collaborative validation for high-accuracy fusion of objective and subjective preference labels.
Contribution/Results: Experiments demonstrate that incorporating personalized driving preferences significantly improves behavioral alignment between model outputs and human drivers. Our benchmark establishes a critical foundation—both data and evaluation protocols—for human-centered autonomous driving research.
📝 Abstract
While personalization has been explored in traditional autonomous driving systems, it remains largely overlooked in end-to-end autonomous driving (E2EAD), despite its growing prominence. This gap is critical, as user-aligned behavior is essential for trust, comfort, and widespread adoption of autonomous vehicles. A core challenge is the lack of large-scale real-world datasets annotated with diverse and fine-grained driving preferences, hindering the development and evaluation of personalized E2EAD models. In this work, we present the first large-scale real-world dataset enriched with annotations capturing diverse driving preferences, establishing a foundation for personalization in E2EAD. We extract static environmental features from real-world road topology and infer dynamic contextual cues using a fine-tuned visual language model (VLM), enabling consistent and fine-grained scenario construction. Based on these scenarios, we derive objective preference annotations through behavioral distribution analysis and rule-based heuristics. To address the inherent subjectivity of driving style, we further employ the VLM to generate subjective annotations by jointly modeling scene semantics and driver behavior. Final high-quality labels are obtained through a human-in-the-loop verification process that fuses both perspectives. Building on this dataset, we propose the first benchmark for evaluating personalized E2EAD models. We assess several state-of-the-art models with and without preference conditioning, demonstrating that incorporating personalized preferences results in behavior more aligned with human driving. Our work lays the foundation for personalized E2EAD by providing a standardized platform to systematically integrate human preferences into data-driven E2EAD systems, catalyzing future research in human-centric autonomy.