🤖 AI Summary
To address the sparse visual information and low detection accuracy of transmission line defects in UAV-based inspection—caused by long-distance imaging and large tilt angles—this paper proposes a defect detection method leveraging Vision-Language Pre-training with Transfer Learning (VLP-TL) and a Progressive Transfer Strategy (PTS). We innovatively design a dual-modal contrastive pre-training task tailored to transmission line scenarios. The PTS mechanism bridges the semantic gap between the pre-training domain and downstream detection tasks via multi-stage knowledge transfer and cross-modal feature alignment. Evaluated on a real-world inspection dataset, the proposed method significantly improves detection performance for small-scale and blurry defects, achieving over a 12.6% gain in mAP. It also demonstrates superior robustness and generalization compared to state-of-the-art unimodal approaches.
📝 Abstract
Unmanned aerial vehicle (UAV) patrol inspection has emerged as a predominant approach in transmission line monitoring owing to its cost-effectiveness. Detecting defects in transmission lines is a critical task during UAV patrol inspection. However, due to imaging distance and shooting angles, UAV patrol images often suffer from insufficient defect-related visual information, which has an adverse effect on detection accuracy. In this article, we propose a novel method for detecting defects in UAV patrol images, which is based on vision-language pretraining for transmission line (VLP-TL) and a progressive transfer strategy (PTS). Specifically, VLP-TL contains two novel pretraining tasks tailored for the transmission line scenario, aimimg at pretraining an image encoder with abundant knowledge acquired from both visual and linguistic information. Transferring the pretrained image encoder to the defect detector as its backbone can effectively alleviate the insufficient visual information problem. In addition, the PTS further improves transfer performance by progressively bridging the gap between pretraining and downstream defection detection. Experimental results demonstrate that the proposed method significantly improves defect detection accuracy by jointly utilizing multimodal information, overcoming the limitations of insufficient defect-related visual information provided by UAV patrol images.