🤖 AI Summary
This study addresses the challenge of achieving pixel-level accurate localization of pavement distresses, which remains difficult for existing methods. Building upon the Mask R-CNN instance segmentation framework, the authors perform fine-grained segmentation of longitudinal cracks, transverse cracks, alligator cracking, and potholes on a newly curated UWGB-StreetCrack dataset comprising real-world road images captured by vehicle-mounted smartphones. To the best of our knowledge, this is the first work to demonstrate high-precision pixel-level distress segmentation using such imagery, while also validating the efficacy of instance segmentation for quantifying crack area coverage. Experiments conducted on the Detectron2 platform employ various backbone architectures—including ResNet-101 FPN—and benchmark against object detection approaches like YOLO. The optimal model achieves a precision of 84.23%, recall of 90.04%, and F1-score of 87.04%, with a predicted crack area ratio of 2.164%, closely matching the ground-truth value of 2.170%.
📝 Abstract
Automated pavement distress assessment requires more than image-level classification or coarse bounding box detection, demanding precise localization of thin, branching, and irregular cracks to achieve the geometric precision necessary for maintenance-relevant quantification. This paper presents a vision-based pavement distress analysis system based on Mask R-CNN instance segmentation and evaluates it on UWGB-StreetCrack, a custom field-collected roadway image dataset acquired with a vehicle-mounted smartphone and manually annotated with polygon labels for longitudinal cracks, transverse cracks, alligator cracks, and potholes. Five Detectron2-based Mask R-CNN backbone variants were considered under a consistent fine-tuning protocol. The best-performing model, Mask R-CNN with a ResNet-101 FPN backbone, achieved 84.23% precision, 90.04% recall, and an F1 score of 87.04% under the project-specific bounding-box matching protocol. The same model produced an aggregate predicted crack-area fraction of 2.164%, closely matching the 2.170% ground-truth crack-area fraction. To contextualize the segmentation system against a detector-oriented alternative, a CSPDarknet53-based YOLO detector was also adapted and retrained on the dataset, reaching 27.5% precision and 20.7% recall on the validation protocol. The results show that instance segmentation is a practical direction for field pavement imagery and aggregate crack-area estimation, while also exposing open challenges in annotation consistency, class imbalance, confounder rejection, and mask-level benchmarking.