V-RoAst: A New Dataset for Visual Road Assessment

📅 2024-08-20
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Road traffic collisions cause millions of fatalities annually, disproportionately affecting low- and middle-income countries (LMICs); conventional manual safety assessments are costly and poorly scalable, while existing CNN-based approaches require large-scale annotated datasets and domain-specific fine-tuning. This paper introduces Visual Road Safety Assessment (V-RoAst), a novel task that leverages lightweight vision-language models (e.g., Gemini-1.5-flash, GPT-4o-mini) for zero-shot, training-free recognition of road attributes and estimation of iRAP safety ratings. By integrating Mapillary street-view imagery with optimized prompt engineering, V-RoAst achieves highly interpretable, low-cost, fully automated assessment without requiring local labeled data. Experimental results demonstrate strong generalization across real-world settings and practical deployability. The framework establishes a scalable, implementable paradigm for intelligent road safety evaluation—particularly suited to resource-constrained LMIC contexts.

Technology Category

Application Category

📝 Abstract
Road traffic crashes cause millions of deaths annually and have a significant economic impact, particularly in low- and middle-income countries (LMICs). This paper presents an approach using Vision Language Models (VLMs) for road safety assessment, overcoming the limitations of traditional Convolutional Neural Networks (CNNs). We introduce a new task ,V-RoAst (Visual question answering for Road Assessment), with a real-world dataset. Our approach optimizes prompt engineering and evaluates advanced VLMs, including Gemini-1.5-flash and GPT-4o-mini. The models effectively examine attributes for road assessment. Using crowdsourced imagery from Mapillary, our scalable solution influentially estimates road safety levels. In addition, this approach is designed for local stakeholders who lack resources, as it does not require training data. It offers a cost-effective and automated methods for global road safety assessments, potentially saving lives and reducing economic burdens.
Problem

Research questions and friction points this paper is trying to address.

Can VLMs replace human-labelled data for road safety assessments?
How to automate road safety evaluations using zero-shot VLMs?
Overcoming geographic limitations in CNN-based road assessment models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Vision Language Models for road assessment
Leverages crowdsourced imagery without labeled data
Optimizes prompt engineering for automated solutions
🔎 Similar Papers
No similar papers found.
N
Natchapon Jongwiriyanurak
Department of Civil, Environmental and Geomatic Engineering, University College London
Z
Zichao Zeng
Department of Civil, Environmental and Geomatic Engineering, University College London
J
June Moh Goo
Department of Civil, Environmental and Geomatic Engineering, University College London
Xinglei Wang
Xinglei Wang
PhD Student, University College London
GIScienceHuman mobilityUrban analyticsSpatio-temporal data mining
I
Ilya Ilyankou
Department of Civil, Environmental and Geomatic Engineering, University College London
K
Kerkritt Srirrongvikrai
Department of Civil Engineering, Chulalongkorn University
Meihui Wang
Meihui Wang
University College London
urban data scienceurban analyticsGeoAISpace-Time Analytics
James Haworth
James Haworth
Associate Professor in Spatio-temporal Analytics, University College London
GIScienceSpatio-temporalMachine LearningTransportIntelligent Transportation Systems