Is Your VLM for Autonomous Driving Safety-Ready? A Comprehensive Benchmark for Evaluating External and In-Cabin Risks

📅 2025-11-18

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing vision-language models (VLMs) lack systematic evaluation of risk perception capabilities in safety-critical autonomous driving scenarios, particularly lacking a comprehensive benchmark that jointly addresses external environmental hazards and in-cabin human behaviors. Method: We introduce DSBench—the first dedicated benchmark for autonomous driving safety risk perception—covering 10 broad risk categories and 28 fine-grained subcategories, with a high-quality annotated dataset of 98K samples. It establishes the first unified dual-domain (external + in-cabin) risk modeling framework and a multi-dimensional, fine-grained, human-in-the-loop evaluation protocol. Contribution/Results: Zero-shot and fine-tuned evaluations of leading open- and closed-source VLMs reveal substantial performance degradation in complex risk scenarios. Fine-tuning on DSBench significantly improves safety-aware recognition, providing both critical infrastructure and empirical evidence to advance safety-oriented VLM development.

Technology Category

Application Category

📝 Abstract

Vision-Language Models (VLMs) show great promise for autonomous driving, but their suitability for safety-critical scenarios is largely unexplored, raising safety concerns. This issue arises from the lack of comprehensive benchmarks that assess both external environmental risks and in-cabin driving behavior safety simultaneously. To bridge this critical gap, we introduce DSBench, the first comprehensive Driving Safety Benchmark designed to assess a VLM's awareness of various safety risks in a unified manner. DSBench encompasses two major categories: external environmental risks and in-cabin driving behavior safety, divided into 10 key categories and a total of 28 sub-categories. This comprehensive evaluation covers a wide range of scenarios, ensuring a thorough assessment of VLMs' performance in safety-critical contexts. Extensive evaluations across various mainstream open-source and closed-source VLMs reveal significant performance degradation under complex safety-critical situations, highlighting urgent safety concerns. To address this, we constructed a large dataset of 98K instances focused on in-cabin and external safety scenarios, showing that fine-tuning on this dataset significantly enhances the safety performance of existing VLMs and paves the way for advancing autonomous driving technology. The benchmark toolkit, code, and model checkpoints will be publicly accessible.

Problem

Research questions and friction points this paper is trying to address.

Evaluating VLMs' safety awareness in autonomous driving scenarios

Assessing both external environmental and in-cabin behavioral risks

Addressing performance degradation in safety-critical driving situations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing DSBench for unified VLM safety evaluation

Constructing 98K safety dataset for fine-tuning VLMs

Assessing external and in-cabin risks across 28 categories

🔎 Similar Papers

LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models