🤖 AI Summary
To address road defect detection under resource-constrained edge-device scenarios, this paper proposes a lightweight and efficient solution that eliminates ensemble methods and test-time augmentation (TTA). The core method introduces an end-to-end jointly trained framework comprising a generator and a detector. Specifically, we design a dual-discriminator GAN architecture and incorporate a CLIP-guided Fréchet Inception Distance (FID) loss to enhance the photorealism and semantic fidelity of synthesized defect images. The generator dynamically supplies hard negative samples during training to improve the detector’s robustness and generalization. Evaluated on the multi-national RDD2022 benchmark, our approach achieves state-of-the-art detection accuracy while using less than 20% of the parameters of standard baseline models—demonstrating superior trade-offs between precision and edge-deployment efficiency.
📝 Abstract
Road defect detection is important for road authorities to reduce the vehicle damage caused by road defects. Considering the practical scenarios where the defect detectors are typically deployed on edge devices with limited memory and computational resource, we aim at performing road defect detection without using ensemble-based methods or test-time augmentation (TTA). To this end, we propose to Jointly Train the image Generator and Detector for road defect detection (dubbed as JTGD). We design the dual discriminators for the generative model to enforce both the synthesized defect patches and overall images to look plausible. The synthesized image quality is improved by our proposed CLIP-based Fréchet Inception Distance loss. The generative model in JTGD is trained jointly with the detector to encourage the generative model to synthesize harder examples for the detector. Since harder synthesized images of better quality caused by the aforesaid design are used in the data augmentation, JTGD outperforms the state-of-the-art method in the RDD2022 road defect detection benchmark across various countries under the condition of no ensemble and TTA. JTGD only uses less than 20% of the number of parameters compared with the competing baseline, which makes it more suitable for deployment on edge devices in practice.