DriveCritic: Towards Context-Aware, Human-Aligned Evaluation for Autonomous Driving with Vision-Language Models

๐Ÿ“… 2025-10-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing autonomous driving planning evaluation metrics (e.g., EPDMS) lack situational awareness and fail to align with human judgment. To address this, we propose DriveCriticโ€”the first planning evaluation dataset and framework integrating visual and symbolic context, grounded in human preference annotations. Methodologically, we construct a dataset covering complex traffic scenarios with fine-grained human preference labels, and design a vision-language model (VLM)-driven two-stage pipeline comprising supervised fine-tuning and reinforcement learning for multimodal trajectory assessment. Our core contribution is the first integration of vision-language understanding into planning evaluation, significantly improving contextual sensitivity and consistency with human preferences. Experiments demonstrate that DriveCritic substantially outperforms existing metrics in human preference alignment, while markedly enhancing evaluation reliability and interpretability.

Technology Category

Application Category

๐Ÿ“ Abstract
Benchmarking autonomous driving planners to align with human judgment remains a critical challenge, as state-of-the-art metrics like the Extended Predictive Driver Model Score (EPDMS) lack context awareness in nuanced scenarios. To address this, we introduce DriveCritic, a novel framework featuring two key contributions: the DriveCritic dataset, a curated collection of challenging scenarios where context is critical for correct judgment and annotated with pairwise human preferences, and the DriveCritic model, a Vision-Language Model (VLM) based evaluator. Fine-tuned using a two-stage supervised and reinforcement learning pipeline, the DriveCritic model learns to adjudicate between trajectory pairs by integrating visual and symbolic context. Experiments show DriveCritic significantly outperforms existing metrics and baselines in matching human preferences and demonstrates strong context awareness. Overall, our work provides a more reliable, human-aligned foundation to evaluating autonomous driving systems.
Problem

Research questions and friction points this paper is trying to address.

Evaluating autonomous driving alignment with human judgment
Addressing context awareness gaps in existing evaluation metrics
Developing vision-language model for nuanced scenario assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Vision-Language Model for autonomous driving evaluation
Fine-tunes model with supervised and reinforcement learning
Integrates visual and symbolic context for trajectory adjudication
๐Ÿ”Ž Similar Papers
No similar papers found.