Is Your VLM for Autonomous Driving Safety-Ready? A Comprehensive Benchmark for Evaluating External and In-Cabin Risks

📅 2025-11-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision-language models (VLMs) lack systematic evaluation of risk perception capabilities in safety-critical autonomous driving scenarios, particularly lacking a comprehensive benchmark that jointly addresses external environmental hazards and in-cabin human behaviors. Method: We introduce DSBench—the first dedicated benchmark for autonomous driving safety risk perception—covering 10 broad risk categories and 28 fine-grained subcategories, with a high-quality annotated dataset of 98K samples. It establishes the first unified dual-domain (external + in-cabin) risk modeling framework and a multi-dimensional, fine-grained, human-in-the-loop evaluation protocol. Contribution/Results: Zero-shot and fine-tuned evaluations of leading open- and closed-source VLMs reveal substantial performance degradation in complex risk scenarios. Fine-tuning on DSBench significantly improves safety-aware recognition, providing both critical infrastructure and empirical evidence to advance safety-oriented VLM development.

Technology Category

Application Category

📝 Abstract
Vision-Language Models (VLMs) show great promise for autonomous driving, but their suitability for safety-critical scenarios is largely unexplored, raising safety concerns. This issue arises from the lack of comprehensive benchmarks that assess both external environmental risks and in-cabin driving behavior safety simultaneously. To bridge this critical gap, we introduce DSBench, the first comprehensive Driving Safety Benchmark designed to assess a VLM's awareness of various safety risks in a unified manner. DSBench encompasses two major categories: external environmental risks and in-cabin driving behavior safety, divided into 10 key categories and a total of 28 sub-categories. This comprehensive evaluation covers a wide range of scenarios, ensuring a thorough assessment of VLMs' performance in safety-critical contexts. Extensive evaluations across various mainstream open-source and closed-source VLMs reveal significant performance degradation under complex safety-critical situations, highlighting urgent safety concerns. To address this, we constructed a large dataset of 98K instances focused on in-cabin and external safety scenarios, showing that fine-tuning on this dataset significantly enhances the safety performance of existing VLMs and paves the way for advancing autonomous driving technology. The benchmark toolkit, code, and model checkpoints will be publicly accessible.
Problem

Research questions and friction points this paper is trying to address.

Evaluating VLMs' safety awareness in autonomous driving scenarios
Assessing both external environmental and in-cabin behavioral risks
Addressing performance degradation in safety-critical driving situations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing DSBench for unified VLM safety evaluation
Constructing 98K safety dataset for fine-tuning VLMs
Assessing external and in-cabin risks across 28 categories
🔎 Similar Papers
No similar papers found.
X
Xianhui Meng
University of Science and Technology of China
Y
Yuchen Zhang
Georgia Institute of Technology
Zhijian Huang
Zhijian Huang
Biochemistry Department and Beckman Institute, University of Illinois at Urbana-Champaigh
modeling and simulationquantum chemistrymembrane transporters and channels
Z
Zheng Lu
Xiaomi EV
Z
Ziling Ji
Fudan University
Y
Yaoyao Yin
Xidian University
H
Hongyuan Zhang
The University of Hong Kong
G
Guangfeng Jiang
University of Science and Technology of China
Y
Yandan Lin
Fudan University
L
Long Chen
Xiaomi EV
H
Hangjun Ye
Xiaomi EV
L
Li Zhang
University of Science and Technology of China
J
Jun Liu
University of Science and Technology of China
Xiaoshuai Hao
Xiaoshuai Hao
Beijing Academy of Artificial Intelligence,BAAI
vision and language