V-RoAst: A New Dataset for Visual Road Assessment

📅 2024-08-20

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Road traffic collisions cause millions of fatalities annually, disproportionately affecting low- and middle-income countries (LMICs); conventional manual safety assessments are costly and poorly scalable, while existing CNN-based approaches require large-scale annotated datasets and domain-specific fine-tuning. This paper introduces Visual Road Safety Assessment (V-RoAst), a novel task that leverages lightweight vision-language models (e.g., Gemini-1.5-flash, GPT-4o-mini) for zero-shot, training-free recognition of road attributes and estimation of iRAP safety ratings. By integrating Mapillary street-view imagery with optimized prompt engineering, V-RoAst achieves highly interpretable, low-cost, fully automated assessment without requiring local labeled data. Experimental results demonstrate strong generalization across real-world settings and practical deployability. The framework establishes a scalable, implementable paradigm for intelligent road safety evaluation—particularly suited to resource-constrained LMIC contexts.

Technology Category

Application Category

📝 Abstract

Road traffic crashes cause millions of deaths annually and have a significant economic impact, particularly in low- and middle-income countries (LMICs). This paper presents an approach using Vision Language Models (VLMs) for road safety assessment, overcoming the limitations of traditional Convolutional Neural Networks (CNNs). We introduce a new task ,V-RoAst (Visual question answering for Road Assessment), with a real-world dataset. Our approach optimizes prompt engineering and evaluates advanced VLMs, including Gemini-1.5-flash and GPT-4o-mini. The models effectively examine attributes for road assessment. Using crowdsourced imagery from Mapillary, our scalable solution influentially estimates road safety levels. In addition, this approach is designed for local stakeholders who lack resources, as it does not require training data. It offers a cost-effective and automated methods for global road safety assessments, potentially saving lives and reducing economic burdens.

Problem

Research questions and friction points this paper is trying to address.

Can VLMs replace human-labelled data for road safety assessments?

How to automate road safety evaluations using zero-shot VLMs?

Overcoming geographic limitations in CNN-based road assessment models.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Vision Language Models for road assessment

Leverages crowdsourced imagery without labeled data

Optimizes prompt engineering for automated solutions

🔎 Similar Papers

ROADWork Dataset: Learning to Recognize, Observe, Analyze and Drive Through Work Zones