Anomaly Object Segmentation with Vision-Language Models for Steel Scrap Recycling

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address the challenges of fine-grained identification and high annotation costs for non-ferrous impurities in steel scrap recycling, this paper proposes a few-shot pixel-level anomaly segmentation method based on vision-language models. Our approach jointly fine-tunes a multi-scale image encoder with semantically aligned textual prompts for normal and anomalous categories, enabling cross-modal feature alignment and class-aware segmentation. Through prompt engineering and multi-class supervised fine-tuning, we significantly enhance generalization to rare anomaly categories. Evaluated on real-world steel slag production line data, our method achieves an mIoU of 92.3%, outperforming the state-of-the-art industrial inspection model by 11.7 percentage points. This advancement enables high-purity intelligent sorting and supports carbon emission reduction in metal recycling operations.

Technology Category

Application Category

📝 Abstract

Recycling steel scrap can reduce carbon dioxide (CO2) emissions from the steel industry. However, a significant challenge in steel scrap recycling is the inclusion of impurities other than steel. To address this issue, we propose vision-language-model-based anomaly detection where a model is finetuned in a supervised manner, enabling it to handle niche objects effectively. This model enables automated detection of anomalies at a fine-grained level within steel scrap. Specifically, we finetune the image encoder, equipped with multi-scale mechanism and text prompts aligned with both normal and anomaly images. The finetuning process trains these modules using a multiclass classification as the supervision.

Problem

Research questions and friction points this paper is trying to address.

Detecting impurities in steel scrap recycling

Fine-grained anomaly segmentation using vision-language models

Supervised finetuning for niche object detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language model for anomaly detection

Multi-scale mechanism in image encoder

Text prompts aligned with images

🔎 Similar Papers

No similar papers found.