🤖 AI Summary
Road traffic collisions cause millions of fatalities annually, disproportionately affecting low- and middle-income countries (LMICs); conventional manual safety assessments are costly and poorly scalable, while existing CNN-based approaches require large-scale annotated datasets and domain-specific fine-tuning. This paper introduces Visual Road Safety Assessment (V-RoAst), a novel task that leverages lightweight vision-language models (e.g., Gemini-1.5-flash, GPT-4o-mini) for zero-shot, training-free recognition of road attributes and estimation of iRAP safety ratings. By integrating Mapillary street-view imagery with optimized prompt engineering, V-RoAst achieves highly interpretable, low-cost, fully automated assessment without requiring local labeled data. Experimental results demonstrate strong generalization across real-world settings and practical deployability. The framework establishes a scalable, implementable paradigm for intelligent road safety evaluation—particularly suited to resource-constrained LMIC contexts.
📝 Abstract
Road traffic crashes cause millions of deaths annually and have a significant economic impact, particularly in low- and middle-income countries (LMICs). This paper presents an approach using Vision Language Models (VLMs) for road safety assessment, overcoming the limitations of traditional Convolutional Neural Networks (CNNs). We introduce a new task ,V-RoAst (Visual question answering for Road Assessment), with a real-world dataset. Our approach optimizes prompt engineering and evaluates advanced VLMs, including Gemini-1.5-flash and GPT-4o-mini. The models effectively examine attributes for road assessment. Using crowdsourced imagery from Mapillary, our scalable solution influentially estimates road safety levels. In addition, this approach is designed for local stakeholders who lack resources, as it does not require training data. It offers a cost-effective and automated methods for global road safety assessments, potentially saving lives and reducing economic burdens.