Understanding Real-World Traffic Safety through RoadSafe365 Benchmark

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the absence of fine-grained evaluation frameworks aligned with official safety standards in existing traffic benchmarks. To bridge this gap, the authors introduce RoadSafe365, the first large-scale, multimodal traffic safety benchmark grounded in real-world scenarios, which uniquely integrates official traffic regulations with data-driven methodologies. They propose a hierarchical, fine-grained, and independently annotated event taxonomy and construct a dataset comprising 36,196 real-world road video clips, yielding 864K multiple-choice question-answer pairs and 36K scene descriptions. Leveraging multimodal vision-language modeling, attribute annotation, and cross-domain transfer learning, the established strong baseline model significantly enhances both traffic safety analysis and interpretable reasoning performance on both real and synthetic data.

Technology Category

Application Category

📝 Abstract

Although recent traffic benchmarks have advanced multimodal data analysis, they generally lack systematic evaluation aligned with official safety standards. To fill this gap, we introduce RoadSafe365, a large-scale vision-language benchmark that supports fine-grained analysis of traffic safety from extensive and diverse real-world video data collections. Unlike prior works that focus primarily on coarse accident identification, RoadSafe365 is independently curated and systematically organized using a hierarchical taxonomy that refines and extends foundational definitions of crash, incident, and violation to bridge official traffic safety standards with data-driven traffic understanding systems. RoadSafe365 provides rich attribute annotations across diverse traffic event types, environmental contexts, and interaction scenarios, yielding 36,196 annotated clips from both dashcam and surveillance cameras. Each clip is paired with multiple-choice question-answer sets, comprising 864K candidate options, 8.4K unique answers, and 36K detailed scene descriptions collectively designed for vision-language understanding and reasoning. We establish strong baselines and observe consistent gains when fine-tuning on RoadSafe365. Cross-domain experiments on both real and synthetic datasets further validate its effectiveness. Designed for large-scale training and standardized evaluation, RoadSafe365 provides a comprehensive benchmark to advance reproducible research in real-world traffic safety analysis.

Problem

Research questions and friction points this paper is trying to address.

traffic safety

benchmark

vision-language

real-world video

safety standards

Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language benchmark

traffic safety analysis

hierarchical taxonomy