🤖 AI Summary
Industrial surface defect detection faces challenges including diverse defect morphologies, large scale variations, strong texture interference, and difficulty in fine-grained recognition. To address these, we propose an enhanced YOLO-based multi-scale defect detection framework. Our method introduces a Detail-Directed Fusion Module (DDFM) and directional asymmetric convolution to improve sensitivity to minute defects; designs attention-weighted concatenation and cross-layer attention fusion to strengthen contextual modeling; and integrates a BiFPN architecture with hierarchical attention to optimize synergistic aggregation of low-level details and high-level semantics. Extensive experiments on multiple benchmark datasets demonstrate that our approach significantly outperforms state-of-the-art methods in mAP, small-object recall, and cross-scale robustness—achieving a favorable balance between detection accuracy and generalization capability.
📝 Abstract
Surface defect detection in industrial scenarios is both crucial and technically demanding due to the wide variability in defect types, irregular shapes and sizes, fine-grained requirements, and complex material textures. Although recent advances in AI-based detectors have improved performance, existing methods often suffer from redundant features, limited detail sensitivity, and weak robustness under multiscale conditions. To address these challenges, we propose YOLO-FDA, a novel YOLO-based detection framework that integrates fine-grained detail enhancement and attention-guided feature fusion. Specifically, we adopt a BiFPN-style architecture to strengthen bidirectional multilevel feature aggregation within the YOLOv5 backbone. To better capture fine structural changes, we introduce a Detail-directional Fusion Module (DDFM) that introduces a directional asymmetric convolution in the second-lowest layer to enrich spatial details and fuses the second-lowest layer with low-level features to enhance semantic consistency. Furthermore, we propose two novel attention-based fusion strategies, Attention-weighted Concatenation (AC) and Cross-layer Attention Fusion (CAF) to improve contextual representation and reduce feature noise. Extensive experiments on benchmark datasets demonstrate that YOLO-FDA consistently outperforms existing state-of-the-art methods in terms of both accuracy and robustness across diverse types of defects and scales.