🤖 AI Summary
Current end-to-end autonomous driving systems are limited to low-level steering commands, lacking the capability to interpret and execute high-level human intentions expressed in natural language, and suffer from the absence of standardized evaluation benchmarks. To address this, we propose Intention-Drive—the first intent-driven, comprehensive benchmark for end-to-end autonomous driving—comprising a high-quality, multi-scenario dataset with fine-grained natural language intent annotations. We introduce Intent Success Rate (ISR), a novel semantic metric for evaluating intent fulfillment. Further, we establish a paradigm shift from “command execution” to “intent realization.” Our baseline model integrates multimodal fusion, scene graph reasoning, and semantic alignment. Evaluation on Intention-Drive reveals fundamental bottlenecks in cross-modal intent understanding and scene-aware collaborative reasoning. This work provides a reproducible evaluation standard and concrete directions for advancement in intent-driven autonomous driving.
📝 Abstract
Current end-to-end autonomous driving systems operate at a level of intelligence akin to following simple steering commands. However, achieving genuinely intelligent autonomy requires a paradigm shift: moving from merely executing low-level instructions to understanding and fulfilling high-level, abstract human intentions. This leap from a command-follower to an intention-fulfiller, as illustrated in our conceptual framework, is hindered by a fundamental challenge: the absence of a standardized benchmark to measure and drive progress on this complex task. To address this critical gap, we introduce Intention-Drive, the first comprehensive benchmark designed to evaluate the ability to translate high-level human intent into safe and precise driving actions. Intention-Drive features two core contributions: (1) a new dataset of complex scenarios paired with corresponding natural language intentions, and (2) a novel evaluation protocol centered on the Intent Success Rate (ISR), which assesses the semantic fulfillment of the human's goal beyond simple geometric accuracy. Through an extensive evaluation of a spectrum of baseline models on Intention-Drive, we reveal a significant performance deficit, showing that the baseline model struggle to achieve the comprehensive scene and intention understanding required for this advanced task.