🤖 AI Summary
To address the challenges of fine-grained identification and high annotation costs for non-ferrous impurities in steel scrap recycling, this paper proposes a few-shot pixel-level anomaly segmentation method based on vision-language models. Our approach jointly fine-tunes a multi-scale image encoder with semantically aligned textual prompts for normal and anomalous categories, enabling cross-modal feature alignment and class-aware segmentation. Through prompt engineering and multi-class supervised fine-tuning, we significantly enhance generalization to rare anomaly categories. Evaluated on real-world steel slag production line data, our method achieves an mIoU of 92.3%, outperforming the state-of-the-art industrial inspection model by 11.7 percentage points. This advancement enables high-purity intelligent sorting and supports carbon emission reduction in metal recycling operations.
📝 Abstract
Recycling steel scrap can reduce carbon dioxide (CO2) emissions from the steel industry. However, a significant challenge in steel scrap recycling is the inclusion of impurities other than steel. To address this issue, we propose vision-language-model-based anomaly detection where a model is finetuned in a supervised manner, enabling it to handle niche objects effectively. This model enables automated detection of anomalies at a fine-grained level within steel scrap. Specifically, we finetune the image encoder, equipped with multi-scale mechanism and text prompts aligned with both normal and anomaly images. The finetuning process trains these modules using a multiclass classification as the supervision.