π€ AI Summary
Conventional pavement condition assessment relies either on subjective manual inspection or supervised learning models requiring extensive labeled dataβboth limiting scalability and objectivity.
Method: This paper proposes a zero-shot, multimodal large language model (MLLM)-driven automated assessment framework. Without any annotated training data, it leverages structured prompt engineering to guide vision-language models (e.g., GPT-4V, Qwen-VL) in interpreting Google Street View imagery and directly generating natural-language evaluations compliant with the Pavement Surface Condition Index (PSCI) standard.
Contribution/Results: It establishes the first LLM-native zero-shot pavement assessment paradigm, eliminating the data annotation bottleneck. Empirical evaluation at city scale demonstrates superior inter-rater consistency and accuracy compared to multi-tier human experts, thereby validating the feasibility of large-scale, fully automated pavement inspection.
π Abstract
Effective and rapid evaluation of pavement surface condition is critical for prioritizing maintenance, ensuring transportation safety, and minimizing vehicle wear and tear. While conventional manual inspections suffer from subjectivity, existing machine learning-based methods are constrained by their reliance on large and high-quality labeled datasets, which require significant resources and limit adaptability across varied road conditions. The revolutionary advancements in Large Language Models (LLMs) present significant potential for overcoming these challenges. In this study, we propose an innovative automated zero-shot learning approach that leverages the image recognition and natural language understanding capabilities of LLMs to assess road conditions effectively. Multiple LLM-based assessment models were developed, employing prompt engineering strategies aligned with the Pavement Surface Condition Index (PSCI) standards. These models' accuracy and reliability were evaluated against official PSCI results, with an optimized model ultimately selected. Extensive tests benchmarked the optimized model against evaluations from various levels experts using Google Street View road images. The results reveal that the LLM-based approach can effectively assess road conditions, with the optimized model -employing comprehensive and structured prompt engineering strategies -outperforming simpler configurations by achieving high accuracy and consistency, even surpassing expert evaluations. Moreover, successfully applying the optimized model to Google Street View images demonstrates its potential for future city-scale deployments. These findings highlight the transformative potential of LLMs in automating road damage evaluations and underscore the pivotal role of detailed prompt engineering in achieving reliable assessments.