🤖 AI Summary
To address the lack of region-specific benchmark data for pothole detection on Bangladeshi roads, this work introduces and publicly releases the first geographically tailored pothole image dataset—comprising 824 real-world annotated samples collected in Dhaka and Bogura—thereby filling a critical gap in high-quality, localized ground-truth annotations. We systematically evaluate nine classification models (including CCT, Swin Transformer, and ResNet50) and four segmentation models (U-Net variants), and propose a small-sample–aware data augmentation strategy specifically designed for road defect discrimination (10× for classification, 4× for segmentation). Experimental results show that lightweight models—e.g., CCT—achieve classification accuracy and F1-score exceeding 99%, matching the performance of heavier architectures (ResNet50, DenseNet201); segmentation attains Dice = 67.54% and IoU = 59.39%, with 2–5× faster inference. Key contributions: (1) the first Bangladeshi pothole benchmark dataset; (2) empirical validation of lightweight architectures’ efficacy under resource constraints; and (3) a novel augmentation paradigm optimized for small-sample road defect analysis.
📝 Abstract
The study involves a comprehensive performance analysis of popular classification and segmentation models, applied over a Bangladeshi pothole dataset, being developed by the authors of this research. This custom dataset of 824 samples, collected from the streets of Dhaka and Bogura performs competitively against the existing industrial and custom datasets utilized in the present literature. The dataset was further augmented four-fold for segmentation and ten-fold for classification evaluation. We tested nine classification models (CCT, CNN, INN, Swin Transformer, ConvMixer, VGG16, ResNet50, DenseNet201, and Xception) and four segmentation models (U-Net, ResU-Net, U-Net++, and Attention-Unet) over both the datasets. Among the classification models, lightweight models namely CCT, CNN, INN, Swin Transformer, and ConvMixer were emphasized due to their low computational requirements and faster prediction times. The lightweight models performed respectfully, oftentimes equating to the performance of heavyweight models. In addition, augmentation was found to enhance the performance of all the tested models. The experimental results exhibit that, our dataset performs on par or outperforms the similar classification models utilized in the existing literature, reaching accuracy and f1-scores over 99%. The dataset also performed on par with the existing datasets for segmentation, achieving model Dice Similarity Coefficient up to 67.54% and IoU scores up to 59.39%.