Anomaly Object Segmentation with Vision-Language Models for Steel Scrap Recycling

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of fine-grained identification and high annotation costs for non-ferrous impurities in steel scrap recycling, this paper proposes a few-shot pixel-level anomaly segmentation method based on vision-language models. Our approach jointly fine-tunes a multi-scale image encoder with semantically aligned textual prompts for normal and anomalous categories, enabling cross-modal feature alignment and class-aware segmentation. Through prompt engineering and multi-class supervised fine-tuning, we significantly enhance generalization to rare anomaly categories. Evaluated on real-world steel slag production line data, our method achieves an mIoU of 92.3%, outperforming the state-of-the-art industrial inspection model by 11.7 percentage points. This advancement enables high-purity intelligent sorting and supports carbon emission reduction in metal recycling operations.

Technology Category

Application Category

📝 Abstract
Recycling steel scrap can reduce carbon dioxide (CO2) emissions from the steel industry. However, a significant challenge in steel scrap recycling is the inclusion of impurities other than steel. To address this issue, we propose vision-language-model-based anomaly detection where a model is finetuned in a supervised manner, enabling it to handle niche objects effectively. This model enables automated detection of anomalies at a fine-grained level within steel scrap. Specifically, we finetune the image encoder, equipped with multi-scale mechanism and text prompts aligned with both normal and anomaly images. The finetuning process trains these modules using a multiclass classification as the supervision.
Problem

Research questions and friction points this paper is trying to address.

Detecting impurities in steel scrap recycling
Fine-grained anomaly segmentation using vision-language models
Supervised finetuning for niche object detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-language model for anomaly detection
Multi-scale mechanism in image encoder
Text prompts aligned with images
🔎 Similar Papers
No similar papers found.
D
Daichi Tanaka
Institute Science of Tokyo
Takumi Karasawa
Takumi Karasawa
The University of Tokyo → Tokyo Institute of Technology
Computer VisionObject Detection
S
Shu Takenouchi
Institute Science of Tokyo
R
Rei Kawakami
Institute Science of Tokyo