HazardNet: A Small-Scale Vision Language Model for Real-Time Traffic Safety Detection at Edge Devices

📅 2025-02-27

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

To address the need for real-time traffic hazard detection at urban edge devices, this paper proposes HazardNet, a lightweight vision-language model. Methodologically: (1) we introduce HazardQA, the first vision-language question-answering dataset specifically designed for traffic risk scenarios; (2) we pioneer the deep adaptation of the open-source small-scale multimodal model Qwen2-VL-2B to edge-based safety detection, achieving a low-latency–high-accuracy trade-off via supervised fine-tuning, domain-specific data curation, and inference optimization; (3) we deploy HazardNet end-to-end on resource-constrained edge hardware. Experiments demonstrate that HazardNet achieves an 89% improvement in F1-score over baseline methods for safety event detection, with several metrics surpassing GPT-4o by up to 6%. Both the model and the HazardQA dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Traffic safety remains a vital concern in contemporary urban settings, intensified by the increase of vehicles and the complicated nature of road networks. Traditional safety-critical event detection systems predominantly rely on sensor-based approaches and conventional machine learning algorithms, necessitating extensive data collection and complex training processes to adhere to traffic safety regulations. This paper introduces HazardNet, a small-scale Vision Language Model designed to enhance traffic safety by leveraging the reasoning capabilities of advanced language and vision models. We built HazardNet by fine-tuning the pre-trained Qwen2-VL-2B model, chosen for its superior performance among open-source alternatives and its compact size of two billion parameters. This helps to facilitate deployment on edge devices with efficient inference throughput. In addition, we present HazardQA, a novel Vision Question Answering (VQA) dataset constructed specifically for training HazardNet on real-world scenarios involving safety-critical events. Our experimental results show that the fine-tuned HazardNet outperformed the base model up to an 89% improvement in F1-Score and has comparable results with improvement in some cases reach up to 6% when compared to larger models, such as GPT-4o. These advancements underscore the potential of HazardNet in providing real-time, reliable traffic safety event detection, thereby contributing to reduced accidents and improved traffic management in urban environments. Both HazardNet model and the HazardQA dataset are available at https://huggingface.co/Tami3/HazardNet and https://huggingface.co/datasets/Tami3/HazardQA, respectively.

Problem

Research questions and friction points this paper is trying to address.

Real-time traffic safety detection on edge devices

Enhance safety using Vision Language Model capabilities

Improve accuracy and efficiency in traffic event detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned Qwen2-VL-2B for edge deployment

HazardQA dataset for real-world safety scenarios

Improved F1-Score by 89% over base model

🔎 Similar Papers

Urban Safety Perception Assessments via Integrating Multimodal Large Language Models with Street View Images