π€ AI Summary
To address the need for real-time traffic hazard detection at urban edge devices, this paper proposes HazardNet, a lightweight vision-language model. Methodologically: (1) we introduce HazardQA, the first vision-language question-answering dataset specifically designed for traffic risk scenarios; (2) we pioneer the deep adaptation of the open-source small-scale multimodal model Qwen2-VL-2B to edge-based safety detection, achieving a low-latencyβhigh-accuracy trade-off via supervised fine-tuning, domain-specific data curation, and inference optimization; (3) we deploy HazardNet end-to-end on resource-constrained edge hardware. Experiments demonstrate that HazardNet achieves an 89% improvement in F1-score over baseline methods for safety event detection, with several metrics surpassing GPT-4o by up to 6%. Both the model and the HazardQA dataset are publicly released.
π Abstract
Traffic safety remains a vital concern in contemporary urban settings, intensified by the increase of vehicles and the complicated nature of road networks. Traditional safety-critical event detection systems predominantly rely on sensor-based approaches and conventional machine learning algorithms, necessitating extensive data collection and complex training processes to adhere to traffic safety regulations. This paper introduces HazardNet, a small-scale Vision Language Model designed to enhance traffic safety by leveraging the reasoning capabilities of advanced language and vision models. We built HazardNet by fine-tuning the pre-trained Qwen2-VL-2B model, chosen for its superior performance among open-source alternatives and its compact size of two billion parameters. This helps to facilitate deployment on edge devices with efficient inference throughput. In addition, we present HazardQA, a novel Vision Question Answering (VQA) dataset constructed specifically for training HazardNet on real-world scenarios involving safety-critical events. Our experimental results show that the fine-tuned HazardNet outperformed the base model up to an 89% improvement in F1-Score and has comparable results with improvement in some cases reach up to 6% when compared to larger models, such as GPT-4o. These advancements underscore the potential of HazardNet in providing real-time, reliable traffic safety event detection, thereby contributing to reduced accidents and improved traffic management in urban environments. Both HazardNet model and the HazardQA dataset are available at https://huggingface.co/Tami3/HazardNet and https://huggingface.co/datasets/Tami3/HazardQA, respectively.