🤖 AI Summary
Industrial surface defect detection faces challenges including extreme foreground-background imbalance, sparse and elongated defects, and low contrast—hindering pixel-level optimization from ensuring reliable sample-level quality control (QC) decisions. To address this, we propose a sample-centric multi-task learning framework that unifies defect classification and pixel-level segmentation: a shared encoder enables joint training, while sample-level supervision modulates feature distributions to enhance recall of small defects. We further introduce decision-aware evaluation metrics—Seg_mIoU and Seg_Recall—that mitigate bias from empty (defect-free) samples and enable gradient alignment between classification and localization tasks. Evaluated on two benchmark datasets, our method significantly improves stability in sample-level judgment and completeness in defect localization, particularly reducing missed detections of elongated and sparse defects, and outperforms existing state-of-the-art approaches.
📝 Abstract
Industrial surface defect inspection for sample-wise quality control (QC) must simultaneously decide whether a given sample contains defects and localize those defects spatially. In real production lines, extreme foreground-background imbalance, defect sparsity with a long-tailed scale distribution, and low contrast are common. As a result, pixel-centric training and evaluation are easily dominated by large homogeneous regions, making it difficult to drive models to attend to small or low-contrast defects-one of the main bottlenecks for deployment. Empirically, existing models achieve strong pixel-overlap metrics (e.g., mIoU) but exhibit insufficient stability at the sample level, especially for sparse or slender defects. The root cause is a mismatch between the optimization objective and the granularity of QC decisions. To address this, we propose a sample-centric multi-task learning framework and evaluation suite. Built on a shared-encoder architecture, the method jointly learns sample-level defect classification and pixel-level mask localization. Sample-level supervision modulates the feature distribution and, at the gradient level, continually boosts recall for small and low-contrast defects, while the segmentation branch preserves boundary and shape details to enhance per-sample decision stability and reduce misses. For evaluation, we propose decision-linked metrics, Seg_mIoU and Seg_Recall, which remove the bias of classical mIoU caused by empty or true-negative samples and tightly couple localization quality with sample-level decisions. Experiments on two benchmark datasets demonstrate that our approach substantially improves the reliability of sample-level decisions and the completeness of defect localization.