🤖 AI Summary
This study addresses the severe performance limitation of defect classification models caused by the scarcity of power line insulator defect images. It proposes, for the first time, a training-free synthetic data generation approach driven by multimodal large language models (MLLMs), which leverages dual-reference conditioning—combining visual and textual prompts—to produce highly diverse defect images. To ensure synthetic data quality, the method incorporates lightweight human verification and an embedding-space distance-based filtering mechanism. Evaluated under an extremely low-data regime with only 104 real images, the integration of synthesized data boosts the test F1 score from 0.615 to 0.739, achieving approximately 4–5× data efficiency gains. The approach demonstrates consistent improvements across multiple backbone architectures and significantly outperforms existing methods.
📝 Abstract
Utility companies increasingly rely on drone imagery for post-event and routine inspection, but training accurate defect-type classifiers remains difficult because defect examples are rare and inspection datasets are often limited or proprietary. We address this data-scarcity setting by using an off-the-shelf multimodal large language model (MLLM) as a training-free image generator to synthesize defect images from visual references and text prompts. Our pipeline increases diversity via dual-reference conditioning, improves label fidelity with lightweight human verification and prompt refinement, and filters the resulting synthetic pool using an embedding-based selection rule based on distances to class centroids computed from the real training split. We evaluate on ceramic insulator defect-type classification (shell vs. glaze) using a public dataset with a realistic low training-data regime (104 real training images; 152 validation; 308 test). Augmenting the 10% real training set with embedding-selected synthetic images improves test F1 score (harmonic mean of precision and recall) from 0.615 to 0.739 (20% relative), corresponding to an estimated 4--5x data-efficiency gain, and the gains persist with stronger backbone models and frozen-feature linear-probe baselines. These results suggest a practical, low-barrier path for improving defect recognition when collecting additional real defects is slow or infeasible.