🤖 AI Summary
This work addresses the challenge in industrial defect detection where subtle visual anomalies—such as hairline cracks and sub-millimeter voids—are frequently overlooked by active learning methods due to their low frequency and perceptual ambiguity. To tackle this issue, the authors propose GSAL, a novel framework that leverages reconstruction discrepancies and denoising variability from diffusion models to quantify the visual difficulty of samples. GSAL further incorporates a three-level hierarchical semantic graph to guide semantic coverage, effectively balancing visual ambiguity with semantic diversity. By integrating uncertainty-aware and diversity-promoting strategies, GSAL consistently outperforms existing baselines on benchmark datasets including thin-film defects, Pascal VOC, and MS COCO, achieving notable improvements in both label efficiency and recall for rare defect categories.
📝 Abstract
Subtle visual anomalies such as hairline cracks, sub-millimeter voids, and low-contrast inclusions are structurally atypical yet visually ambiguous, making them both difficult to annotate and easy to overlook during active learning. Standard acquisition heuristics based on discriminative uncertainty or feature diversity often overselect dominant patterns while underexploring sparse yet important regions of the data space. This failure mode is especially severe in industrial defect inspection, where anomalies may be both low-prevalence and difficult to distinguish from surrounding structure. To resolve this, we propose GSAL, an active learning framework for object detection that combines a diffusion-based difficulty signal with a hierarchical semantic coverage prior. The diffusion component scores images and proposals using reconstruction discrepancy and denoising variability, prioritizing visually atypical or ambiguous examples. However, diffusion alone does not prevent acquisition from repeatedly favoring hard samples within dominant semantic modes. The semantic component therefore organizes candidate samples in a three-level concept graph and promotes coverage of underrepresented semantic regions while providing interpretable acquisition rationales. By balancing visual difficulty with semantic coverage, GSAL improves retrieval of subtle and rare targets that are often missed by uncertainty-only selection. Experiments on a proprietary thin-film defect, Pascal VOC and MS COCO dataset show consistent gains in label efficiency and rare-class retrieval over uncertainty-, diversity-, and hybrid-based baselines