๐ค AI Summary
This work addresses the challenge of unsupervised anomaly detection and pixel-level localization in highly unstructured coal conveyor belt scenes, where random stacking of coal and gangue, complex backgrounds, low anomaly contrast, and occlusions pose significant difficulties. To this end, we introduce CoalAD, the first benchmark dataset tailored to this scenario, and propose a multi-branch collaborative perception framework that integrates object-level semantic composition modeling, semantic attribute deviation analysis, and fine-grained texture matching to jointly achieve image-level anomaly scoring and pixel-level localization. Experimental results demonstrate that our method substantially outperforms existing baselines on CoalAD, and ablation studies confirm the effectiveness of each component. This approach breaks away from the reliance of conventional methods on the stable conditions typical of structured industrial environments.
๐ Abstract
Reliable foreign-object anomaly detection and pixel-level localization in conveyor-belt coal scenes are essential for safe and intelligent mining operations. This task is particularly challenging due to the highly unstructured environment: coal and gangue are randomly piled, backgrounds are complex and variable, and foreign objects often exhibit low contrast, deformation, occlusion, resulting in coupling with their surroundings. These characteristics weaken the stability and regularity assumptions that many anomaly detection methods rely on in structured industrial settings, leading to notable performance degradation. To support evaluation and comparison in this setting, we construct \textbf{CoalAD}, a benchmark for unsupervised foreign-object anomaly detection with pixel-level localization in coal-stream scenes. We further propose a complementary-cue collaborative perception framework that extracts and fuses complementary anomaly evidence from three perspectives: object-level semantic composition modeling, semantic-attribution-based global deviation analysis, and fine-grained texture matching. The fused outputs provide robust image-level anomaly scoring and accurate pixel-level localization. Experiments on CoalAD demonstrate that our method outperforms widely used baselines across the evaluated image-level and pixel-level metrics, and ablation studies validate the contribution of each component. The code is available at https://github.com/xjpp2016/USAD.