Evaluation of Safety Cognition Capability in Vision-Language Models for Autonomous Driving

📅 2025-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current evaluation frameworks lack standardized benchmarks for assessing the safety-aware cognitive capabilities of vision-language models (VLMs) in autonomous driving—particularly in human–machine interaction scenarios. Method: We introduce SCD-Bench, the first safety-cognitive evaluation benchmark tailored to autonomous driving. It establishes a safety-cognition-driven evaluation paradigm, leverages an Autonomous Driving Annotation (ADA) system for scalable multimodal data curation, and integrates expert annotation with LLM-based automated assessment, achieving 99.74% inter-annotator agreement. Contribution/Results: Experiments reveal that mainstream open-source VLMs exhibit substantially weaker safety cognition than GPT-4o; lightweight models (1B–4B parameters) perform near-chance, exposing critical deployment bottlenecks. SCD-Bench provides both a rigorous evaluation standard and a methodological foundation for trustworthy integration of VLMs into safety-critical autonomous driving systems.

Technology Category

Application Category

📝 Abstract
Assessing the safety of vision-language models (VLMs) in autonomous driving is particularly important; however, existing work mainly focuses on traditional benchmark evaluations. As interactive components within autonomous driving systems, VLMs must maintain strong safety cognition during interactions. From this perspective, we propose a novel evaluation method: Safety Cognitive Driving Benchmark (SCD-Bench) . To address the large-scale annotation challenge for SCD-Bench, we develop the Autonomous Driving Image-Text Annotation System (ADA) . Additionally, to ensure data quality in SCD-Bench, our dataset undergoes manual refinement by experts with professional knowledge in autonomous driving. We further develop an automated evaluation method based on large language models (LLMs). To verify its effectiveness, we compare its evaluation results with those of expert human evaluations, achieving a consistency rate of 99.74%. Preliminary experimental results indicate that existing open-source models still lack sufficient safety cognition, showing a significant gap compared to GPT-4o. Notably, lightweight models (1B-4B) demonstrate minimal safety cognition. However, since lightweight models are crucial for autonomous driving systems, this presents a significant challenge for integrating VLMs into the field.
Problem

Research questions and friction points this paper is trying to address.

Evaluating safety cognition in vision-language models for autonomous driving.
Developing a benchmark and annotation system for safety evaluation.
Assessing lightweight models' safety cognition for autonomous driving integration.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed Safety Cognitive Driving Benchmark (SCD-Bench)
Created Autonomous Driving Image-Text Annotation System (ADA)
Automated evaluation using large language models (LLMs)
🔎 Similar Papers
No similar papers found.
E
Enming Zhang
School of Artificial Intelligence, University of Chinese Academy of Sciences; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences
P
Peizhe Gong
School of Artificial Intelligence, University of Chinese Academy of Sciences; State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences
Xingyuan Dai
Xingyuan Dai
Institute of Automation, Chinese Academy of Sciences
Artificial IntelligenceParallel IntelligenceReinforcement LearningITS
Yisheng Lv
Yisheng Lv
The University of Chinese Academy of Sciences, and Chinese Academy of Sciences
Parallel IntelligenceAI for TransportationAutonomous VehiclesParallel Transportation Systems
Q
Qinghai Miao
School of Artificial Intelligence, University of Chinese Academy of Sciences