🤖 AI Summary
To address the challenges of heavy reliance on dense pixel-level annotations and poor adaptability to diverse industrial conditions in scrap sorting—particularly for foreign object detection and segmentation—this paper proposes a novel weakly supervised learning paradigm termed “before-after supervision,” leveraging image differences before and after human intervention. Methodologically, we design an end-to-end semantic segmentation framework integrating differential feature modeling and multi-view consistency constraints, enabling unified evaluation of diverse weak supervision strategies. Key contributions include: (1) the first formulation of operator removal actions as natural weak supervision signals; (2) the release of WS², the first multi-view, high-resolution weakly supervised dataset specifically designed for industrial scrap sorting; and (3) empirical validation on WS² demonstrating that multiple weakly supervised methods achieve over 90% of fully supervised performance, substantially reducing annotation costs while maintaining practical deployability.
📝 Abstract
In industrial quality control, to visually recognize unwanted items within a moving heterogeneous stream, human operators are often still indispensable. Waste-sorting stands as a significant example, where operators on multiple conveyor belts manually remove unwanted objects to select specific materials. To automate this recognition problem, computer vision systems offer great potential in accurately identifying and segmenting unwanted items in such settings. Unfortunately, considering the multitude and the variety of sorting tasks, fully supervised approaches are not a viable option to address this challange, as they require extensive labeling efforts. Surprisingly, weakly supervised alternatives that leverage the implicit supervision naturally provided by the operator in his removal action are relatively unexplored. In this paper, we define the concept of Before-After Supervision, illustrating how to train a segmentation network by leveraging only the visual differences between images acquired extit{before} and extit{after} the operator. To promote research in this direction, we introduce WS$^2$ (Weakly Supervised segmentation for Waste-Sorting), the first multiview dataset consisting of more than 11 000 high-resolution video frames captured on top of a conveyor belt, including "before" and "after" images. We also present a robust end-to-end pipeline, used to benchmark several state-of-the-art weakly supervised segmentation methods on WS$^2$.