Synthetic Defect Image Generation for Power Line Insulator Inspection Using Multimodal Large Language Models

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the severe performance limitation of defect classification models caused by the scarcity of power line insulator defect images. It proposes, for the first time, a training-free synthetic data generation approach driven by multimodal large language models (MLLMs), which leverages dual-reference conditioning—combining visual and textual prompts—to produce highly diverse defect images. To ensure synthetic data quality, the method incorporates lightweight human verification and an embedding-space distance-based filtering mechanism. Evaluated under an extremely low-data regime with only 104 real images, the integration of synthesized data boosts the test F1 score from 0.615 to 0.739, achieving approximately 4–5× data efficiency gains. The approach demonstrates consistent improvements across multiple backbone architectures and significantly outperforms existing methods.

Technology Category

Application Category

📝 Abstract
Utility companies increasingly rely on drone imagery for post-event and routine inspection, but training accurate defect-type classifiers remains difficult because defect examples are rare and inspection datasets are often limited or proprietary. We address this data-scarcity setting by using an off-the-shelf multimodal large language model (MLLM) as a training-free image generator to synthesize defect images from visual references and text prompts. Our pipeline increases diversity via dual-reference conditioning, improves label fidelity with lightweight human verification and prompt refinement, and filters the resulting synthetic pool using an embedding-based selection rule based on distances to class centroids computed from the real training split. We evaluate on ceramic insulator defect-type classification (shell vs. glaze) using a public dataset with a realistic low training-data regime (104 real training images; 152 validation; 308 test). Augmenting the 10% real training set with embedding-selected synthetic images improves test F1 score (harmonic mean of precision and recall) from 0.615 to 0.739 (20% relative), corresponding to an estimated 4--5x data-efficiency gain, and the gains persist with stronger backbone models and frozen-feature linear-probe baselines. These results suggest a practical, low-barrier path for improving defect recognition when collecting additional real defects is slow or infeasible.
Problem

Research questions and friction points this paper is trying to address.

defect detection
data scarcity
insulator inspection
image classification
synthetic data
Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic image generation
multimodal large language model
defect classification
data scarcity
embedding-based selection
🔎 Similar Papers
No similar papers found.
X
Xuesong Wang
Department of Electrical and Computer Engineering, Wayne State University, 42 W. Warren Ave., Detroit, 48201, MI, USA
Caisheng Wang
Caisheng Wang
Wayne State University
Power and Energy