🤖 AI Summary
To address the challenges of severe class imbalance and multi-level defect overlap in semiconductor wafer map analysis, this paper proposes a lightweight ViT-Tiny visual transformer framework—the first adaptation of a streamlined Vision Transformer to wafer defect classification. By optimizing the patch embedding strategy with a patch size of 16, the model achieves enhanced robustness and generalization under limited-sample conditions. Evaluated on the WM-38k dataset, our method attains a 98.4% F1-score across four defect classes—outperforming MSF-Trans by 2.94%. It further improves binary-class recall by 2.86% and ternary-class precision by 3.13%. The proposed approach balances high classification accuracy with low computational overhead, offering an efficient, deployable Vision Transformer paradigm for automated wafer quality inspection.
📝 Abstract
Semiconductor wafer defect classification is critical for ensuring high precision and yield in manufacturing. Traditional CNN-based models often struggle with class imbalances and recognition of the multiple overlapping defect types in wafer maps. To address these challenges, we propose ViT-Tiny, a lightweight Vision Transformer (ViT) framework optimized for wafer defect classification. Trained on the WM-38k dataset. ViT-Tiny outperforms its ViT-Base counterpart and state-of-the-art (SOTA) models, such as MSF-Trans and CNN-based architectures. Through extensive ablation studies, we determine that a patch size of 16 provides optimal performance. ViT-Tiny achieves an F1-score of 98.4%, surpassing MSF-Trans by 2.94% in four-defect classification, improving recall by 2.86% in two-defect classification, and increasing precision by 3.13% in three-defect classification. Additionally, it demonstrates enhanced robustness under limited labeled data conditions, making it a computationally efficient and reliable solution for real-world semiconductor defect detection.