Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This study addresses the severe performance limitation of defect classification models caused by the scarcity of power line insulator defect images. It proposes, for the first time, a training-free synthetic data generation approach driven by multimodal large language models (MLLMs), which leverages dual-reference conditioning—combining visual and textual prompts—to produce highly diverse defect images. To ensure synthetic data quality, the method incorporates lightweight human verification and an embedding-space distance-based filtering mechanism. Evaluated under an extremely low-data regime with only 104 real images, the integration of synthesized data boosts the test F1 score from 0.615 to 0.739, achieving approximately 4–5× data efficiency gains. The approach demonstrates consistent improvements across multiple backbone architectures and significantly outperforms existing methods.

Technology Category

Application Category

📝 Abstract

Utility companies increasingly rely on drone imagery for post-event and routine inspection, but training accurate defect-type classifiers remains difficult because defect examples are rare and inspection datasets are often limited or proprietary. We address this data-scarcity setting by using an off-the-shelf multimodal large language model (MLLM) as a training-free image generator to synthesize defect images from visual references and text prompts. Our pipeline increases diversity via dual-reference conditioning, improves label fidelity with lightweight human verification and prompt refinement, and filters the resulting synthetic pool using an embedding-based selection rule based on distances to class centroids computed from the real training split. We evaluate on ceramic insulator defect-type classification (shell vs. glaze) using a public dataset with a realistic low training-data regime (104 real training images; 152 validation; 308 test). Augmenting the 10% real training set with embedding-selected synthetic images improves test F1 score (harmonic mean of precision and recall) from 0.615 to 0.739 (20% relative), corresponding to an estimated 4--5x data-efficiency gain, and the gains persist with stronger backbone models and frozen-feature linear-probe baselines. These results suggest a practical, low-barrier path for improving defect recognition when collecting additional real defects is slow or infeasible.

Problem

Research questions and friction points this paper is trying to address.

defect detection

data scarcity

insulator inspection

image classification

synthetic data

Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic image generation

multimodal large language model

defect classification